secure by design - guidance in legacy code
Abstract
- Introducing domain primitives should be done at the semantic boundary of your context.
- Ambiguous parameters in APIs are a common source of security bugs.
- You should be on the lookout for ambiguous parameters when reading code and address them using the direct approach, the discovery approach, or the new API approach.
- Never log unchecked user input, because it opens up the risk of second-order injection attacks.
- Limit the number of times a sensitive value can be accessed, because it allows you to detect unintentional access.
- Use explicit accessor methods for data that you want to log. Otherwise, new fields can end up in logs by accident.
- Because defensive code constructs can be harmful, clarify them using contracts and domain primitives.
- The DRY (Don’t Repeat Yourself) principle is about repeated representation of knowledge, not about repeated text.
- Trying to reduce repeated text that isn’t repeated knowledge can cause unnecessary dependencies.
- Failing to reduce repeated knowledge because the text differs can cause vulnerable inconsistencies.
- Tests reveal possible weaknesses in your code, and you should look for invalid and extreme input tests.
- Ensure that your domain primitives encompass an entire conceptual whole.
- Be on the lookout for domain types lacking proper validation and address them with domain primitives and secure entities.
Determining where to apply domain primitives in legacy code
Tip
Start by identifying a bounded context and the semantic boundary, because this is where you should introduce domain primitives.
A good starting point when identifying a bounded context is to group concepts that belong together.
The next step is to introduce domain primitives to express the semantic boundary.
Ambiguous parameter lists
The solution for ambiguous parameter lists is to replace all the parameters, or as many as you can, with domain primitives or secure entities. By doing so, you’ll not only make the parameters unambiguous but also make the code more secure.
Approaches to dealing with ambiguous parameter lists:
Direct approach
Replace all ambiguous parameter at once.
- pros
- solves everything right away
- works well with smaller code-bases and few developers
- can be performed quickly if codebase is of resonable size
- cons
- too much refactoring in large codebases
- not well suited if data quality is a big issue
- best performed by a single developer
Discovery approach
Find and fix the problems before changing the API
- pros
- works well if data quality is poor
- works with larger codebases and multiple teams
- cons
- takes a long time to complete as:
- you first need to introduce the creation of secure domain types
- then monitor logs for a period of time
- then address data quality problems
- may not be able to keep up with a fast-changing codebase
- takes a long time to complete as:
Example:
New API approach
Create a new API and then gradually refactor away the old API.
- pros
- allows for incremental refactoring
- works well with both small and large codebases
- works with multiple developers and teams
- cons
- if data quality is an issue, combine it with the discovery approach
Warning
Builder pattern is not so much a solution to the security issues but rather a band-aid. It partially solves the problem with accidental swapping, but you’re not getting the benfits of exact invariants and crisp definitions that you get by using domain primitives.
Logging unchecked strings
Unchecked strings are, in fact, the root cause of many attack vectors and should be avoided at all cost.
Identifying logging of unchecked strings
An attacker could inject anything that matches the rules of a String
object, which literally could be anything (for example, 100 millions characters or a script).
Warning
Never log unchecked user input, because it opens up the risk of second-order injection attacks.
The solution is to use a domain primitive instead of a String
object, because it lets you control the input instead of a the attacker having control.
But how you introduce domain primitives in an API could cause a whole set of new problems, and choosing a strategy that doesn’t cause too much fuss is recommended.
Identifying implicit data leakage
Another issue related to logging unchecked strings is leakage of sensitive data in logs due to evolving domain model.
Tip
Always limit the number of times a sensitive value can be accessed in your code. That way, you’ll be able to detect unintentional access.
Often, we log using the statement logger.info("Received " + tableReservation);
, which uses the toString
method. By default, it results in a string containing the class name and an unsigned hexadecimal representation of the hash code.
So often, we override it using reflection to dynamically extract all the data. This implies that sensitive data can implicitly leak through the toString
implementation.
You should, therefore, never rely on toString
when logging. You should use explicit accessor methods for data you want to log, because then, new fields never end up in logs by accident.
Defensive code constructs
A well-designed system should never crash with a NPE, but we often see code riddled with != null
checks. It’s an expression that one piece of code dare not trust the design of the rest of the code.
The code becomes riddled with defensive code constructs of unclear purpose.
The code becomes convoluted, hard-to-read, and bloated because of unnecessary repetition, also making it hard to refactor. When such code is maintained over time, it risks becoming incoherent.
One way to turn this around is to clarify the contracts between different parts of the code with code construct and introduce appropriate domain primitives.
Warning
Watch out for
Optional
as it carries no domain insight but is a cover-up for the code not trusting the rest of the design.
DRY misapplied - not focusing on ideas, but on text
It’s definitely true that duplicated text often is due to a duplication of ideas. But “look for duplicated text” isn’t a fault-free test. There are both false positive and false negative: false positive look like repetition but aren’t; false negative don’t look like repetition but are.
False positives might lead you to link together parts of the code that aren’t related, creating unnecessary dependencies. A false negative, on the other hand, risks inconsistent functionality evolving, raising the risk of security vulnerabilities.
Tip
A
Util
class might be a handy place to collect small domain-logical helper methods, but it shouldn’t be the end state, rather a temporary resting place on the road to further refactoring instead.
Only testing the good enough
To promote security in depth, you also need to test with invalid and extreme input, because it pushes your code to the limit.
Using domain primitives can lull you into a false sens of security, unless you’re careful. The design of a domain primitive certainly takes care of invalid input, but there’s no guarantee that it protects against failures and unexpected behavior caused by a dependency, e.g. a timeout in a database.
Partial domain primitives
Another pitfall when creating domain primitives is to not encompass a complete conceptual whole. When designing domain primitives, it’s a good idea to keep them as small as possible, perhaps only wrapping a value in a thin shell of domain terminology.
Tip
Stay alert and keep an eye open for contextual information that’s transmitted out-of-band; it might be what’s needed to make a domain primitive that wraps a conceptual whole.
Example: summing with different currency.