secure by design - guidance in legacy code

Abstract

  • Introducing domain primitives should be done at the semantic boundary of your context.
  • Ambiguous parameters in APIs are a common source of security bugs.
  • You should be on the lookout for ambiguous parameters when reading code and address them using the direct approach, the discovery approach, or the new API approach.
  • Never log unchecked user input, because it opens up the risk of second-order injection attacks.
  • Limit the number of times a sensitive value can be accessed, because it allows you to detect unintentional access.
  • Use explicit accessor methods for data that you want to log. Otherwise, new fields can end up in logs by accident.
  • Because defensive code constructs can be harmful, clarify them using contracts and domain primitives.
  • The DRY (Don’t Repeat Yourself) principle is about repeated representation of knowledge, not about repeated text.
  • Trying to reduce repeated text that isn’t repeated knowledge can cause unnecessary dependencies.
  • Failing to reduce repeated knowledge because the text differs can cause vulnerable inconsistencies.
  • Tests reveal possible weaknesses in your code, and you should look for invalid and extreme input tests.
  • Ensure that your domain primitives encompass an entire conceptual whole.
  • Be on the lookout for domain types lacking proper validation and address them with domain primitives and secure entities.

Determining where to apply domain primitives in legacy code

Tip

Start by identifying a bounded context and the semantic boundary, because this is where you should introduce domain primitives.

A good starting point when identifying a bounded context is to group concepts that belong together.

The next step is to introduce domain primitives to express the semantic boundary.

Ambiguous parameter lists

The solution for ambiguous parameter lists is to replace all the parameters, or as many as you can, with domain primitives or secure entities. By doing so, you’ll not only make the parameters unambiguous but also make the code more secure.

Approaches to dealing with ambiguous parameter lists:

Direct approach

Replace all ambiguous parameter at once.

  • pros
    • solves everything right away
    • works well with smaller code-bases and few developers
    • can be performed quickly if codebase is of resonable size
  • cons
    • too much refactoring in large codebases
    • not well suited if data quality is a big issue
    • best performed by a single developer

Discovery approach

Find and fix the problems before changing the API

  • pros
    • works well if data quality is poor
    • works with larger codebases and multiple teams
  • cons
    • takes a long time to complete as:
      • you first need to introduce the creation of secure domain types
      • then monitor logs for a period of time
      • then address data quality problems
    • may not be able to keep up with a fast-changing codebase

Example:

public void shipItems(
  int itemId,
  int quantity,
  Address billing,
  Address to
) {
  // for each ambiguous parameter, try to create the corresponding domain primitive
  tryCreateItemId(itemId);
  tryCreateQuantity(quantity);
}
 
private void tryCreateItemId(int itemId) {
  // purpose isn't to prevent bad data from entering, but to discover it
  try {
    new ItemId(itemId);
  } catch (Exception e) {
    logError("Error while creating itemId", e);
  }
}

New API approach

Create a new API and then gradually refactor away the old API.

  • pros
    • allows for incremental refactoring
    • works well with both small and large codebases
    • works with multiple developers and teams
  • cons
    • if data quality is an issue, combine it with the discovery approach

Warning

Builder pattern is not so much a solution to the security issues but rather a band-aid. It partially solves the problem with accidental swapping, but you’re not getting the benfits of exact invariants and crisp definitions that you get by using domain primitives.

Logging unchecked strings

Unchecked strings are, in fact, the root cause of many attack vectors and should be avoided at all cost.

Identifying logging of unchecked strings

An attacker could inject anything that matches the rules of a String object, which literally could be anything (for example, 100 millions characters or a script).

Warning

Never log unchecked user input, because it opens up the risk of second-order injection attacks.

The solution is to use a domain primitive instead of a String object, because it lets you control the input instead of a the attacker having control. But how you introduce domain primitives in an API could cause a whole set of new problems, and choosing a strategy that doesn’t cause too much fuss is recommended.

Identifying implicit data leakage

Another issue related to logging unchecked strings is leakage of sensitive data in logs due to evolving domain model.

Tip

Always limit the number of times a sensitive value can be accessed in your code. That way, you’ll be able to detect unintentional access.

Often, we log using the statement logger.info("Received " + tableReservation);, which uses the toString method. By default, it results in a string containing the class name and an unsigned hexadecimal representation of the hash code. So often, we override it using reflection to dynamically extract all the data. This implies that sensitive data can implicitly leak through the toString implementation.

You should, therefore, never rely on toString when logging. You should use explicit accessor methods for data you want to log, because then, new fields never end up in logs by accident.

Defensive code constructs

A well-designed system should never crash with a NPE, but we often see code riddled with != null checks. It’s an expression that one piece of code dare not trust the design of the rest of the code. The code becomes riddled with defensive code constructs of unclear purpose.

The code becomes convoluted, hard-to-read, and bloated because of unnecessary repetition, also making it hard to refactor. When such code is maintained over time, it risks becoming incoherent.

One way to turn this around is to clarify the contracts between different parts of the code with code construct and introduce appropriate domain primitives.

Warning

Watch out for Optional as it carries no domain insight but is a cover-up for the code not trusting the rest of the design.

DRY misapplied - not focusing on ideas, but on text

It’s definitely true that duplicated text often is due to a duplication of ideas. But “look for duplicated text” isn’t a fault-free test. There are both false positive and false negative: false positive look like repetition but aren’t; false negative don’t look like repetition but are.

False positives might lead you to link together parts of the code that aren’t related, creating unnecessary dependencies. A false negative, on the other hand, risks inconsistent functionality evolving, raising the risk of security vulnerabilities.

Tip

A Util class might be a handy place to collect small domain-logical helper methods, but it shouldn’t be the end state, rather a temporary resting place on the road to further refactoring instead.

Only testing the good enough

To promote security in depth, you also need to test with invalid and extreme input, because it pushes your code to the limit.

Using domain primitives can lull you into a false sens of security, unless you’re careful. The design of a domain primitive certainly takes care of invalid input, but there’s no guarantee that it protects against failures and unexpected behavior caused by a dependency, e.g. a timeout in a database.

Partial domain primitives

Another pitfall when creating domain primitives is to not encompass a complete conceptual whole. When designing domain primitives, it’s a good idea to keep them as small as possible, perhaps only wrapping a value in a thin shell of domain terminology.

Tip

Stay alert and keep an eye open for contextual information that’s transmitted out-of-band; it might be what’s needed to make a domain primitive that wraps a conceptual whole.

Example: summing with different currency.

class Money {
  private Amount amout;
  private Currency currency;
 
  public Money add(Money other) {
    // checks that you're not trying to add different currencies
    isTrue(this.currency.equals(other.currency));
    // ...
  }
}