secure by design - guidance on microservices

Abstract

  • A good microservice should have an independent runtime, allow independent updates, and be designed for other services being down.
  • Treating each microservice as a bounded context helps you design more secure APIs.
  • Secure design principles such as domain primitives and context mapping are also applicable when designing microservice APIs.
  • In order to avoid common security pitfalls, only expose domain operations in APIs, use explicit context mapping between services, and pay extra attention to evolving APIs.
  • It’s important to analyze confidentiality, integrity, availability, and traceability (CIA-T) across all services.
  • Identify data that’s sensitive and possibly needs to be secured across services.
  • The integrity of log data is important from a security standpoint.
  • Normalization and categorization of log data requires extensive domain knowledge and should be part of the service design.
  • A service must be uniquely identifiable by its name, version number, and instance ID.
  • A transaction must be traceable across systems.
  • Using a logger with a domain-oriented API facilitates a design that considers the confidentiality of log data.
  • Don’t intermix sensitive and nonsensitive data in the same log, because that can lead to accidental information leakage.

What’s a microservice

Microservice architecture is an architectural style of building systems that has become popular as an alternative to and a reaction against the monolithic style.

Quote

Microservices—also known as the microservice architecture—is an architectural style that structures an application as a collection of loosely coupled services, which implement business capabilities. The microservice architecture enables the continuous delivery/ deployment of large, complex applications. — Chris Richardson, https://microservices.io

Warning

Be careful of distributed monolith where a previous monolith that had been split into multiple separate services, and if one of them went down, none of the others could do their jobs, and it’s impossible to restart only one of them, because all seven services had to be started in a particular order.

Independent runtimes

  • should run in its own runtime, independent of the other services
    • no dependencies of the type “this one has to start before that one”
    • services shouldn’t make assumptions about the particulars of other services

Independent updates

A change in functionality should be isolated to a few services at most. The ideal case is that you only need to touch one single service for a functional update. But it makes sens that if you extend the functionality one service provides, then you’ll most probably want to change some of the calling code in another service to make that change usable and valuable.

What you want to avoid is a change that ripples from one service to the next, then over to a third, and so on.

A huge help in this is orienting each service around a business domain.

Designed for down

A service needs to be designed so that it behaves well when the other service is down and it recovers to normal operation when the other service is up again.

The service isn’t only designed for the happy case when every service is up, but also designed for when services it depends on are down.

A neat trick when developing is to start with implementing what the service should do in case a service it depends on is down. This is easier if each service is designed as a bounded context of a business domain.

Another powerful approach is to design your architecture as event-driven, where the services communicate by passing messages. In that case, the services pull incoming messages from a queue or topic at their own discretion, so the sender makes no assumption about whether the receiver is up or down.

Each service is a bounded context

You can design microservices in many ways, but we believe a good design principle is to think of each service as a bounded context:

  • If you treat each service as a bounded context with an API that faces the rest of the world, you can use the various design principles and tools you’ve learned in this book to build more secure services.
  • It’ll help you decide where a certain feature belongs, because it’s easier to reason about the home of the feature when you’re thinking in terms of bounded contexts instead of technical services or APIs. This helps you with the challenge of slicing the feature set.

The importance of designing your API

Each service should be treated as a bounded context, and the public API is its interface to the rest of the world. You should apply the concept of using domain primitives to harden your APIs and the importance of not exposing your internal domain publicly when designing the API of your microservice to make it more secure.

Services that only expose domain operations in the API can enforce invariants and maintain a valid state.

Only exposing domain operations in the API means the service is now in full control of maintaining a valid state and upholding all applicable invariants, which is a cornerstone for building secure systems.

Splitting monoliths

In terms of API design, one thing to watch out for when splitting a monolith is that you must also discover and enforce the translation between the different context, context that are now in different microservices.

Always be wary when making calls across services and make it a habit to add explicit translation to and from the context you’re talking to. A good way of doing this is by thinking carefully about the semantics and using code constructs like domain primitives.

Semantics and evolving services

Subtle changes in semantics can lead to security issues if appropriate changes aren’t also made in the translation between the different bounded context, in other words, broken context mappings can cause security problems.

Context mapping, taking nearby microservices into account, and thinking carefully about how to evolve semantics in the APIs are some effective ways of handling evolving services in a safe way.

Tip

Avoid redefining existing terminology when semantics change. Introduce new terms that let you express the new semantics.

Changes in semantics are something that usually requires some degree of domain modeling and context mapping to get right. Sometimes the changes in semantics can lead to a change of context boundaries, and, because each service is abounded context, the change of boundaries leads to a change in the microservices you have. New services get created or existing ones get merged as a result of evolving semantics.

Sensitive data across services

CIA-T in a microservice architecture

Focus on the security triad of CIA-T:

  • confidentiality: keeping things secret
  • integrity: ensuring things don’t change in bad way
  • availability: keeping things available when needed
  • traceability: knowing who changed what

Ensuring confidentiality gets trickier because a request for data might travel from component to component. To keep track of this, you need some token to be passed with the request, and when a request reaches a service, the service needs to check whether the requester is authorized.

When guaranteeing integrity across multiple services, thow things are important: every piece of information should have an authoritative source (usually a specific service where the data lives) and that the data hasn’t been tampered with (usually with classical cryptography by providing some sort of checksum or signature to ensure integrity).

For availability, a cached value from a previous call or a sensible default using circuit breakers and other tools can be useful to design for availability.

For traceability, you might need to track the origin requester by correlating different calls to different services to see the bigger pattern of who accessed what.

Thinking “sensitive”

The requirement for confidentiality isn’t an absolute but something that depends on context.

To identify sensitive data, you can ask yourself the following questions:

  • should this data be confidential in another context?
  • does the data require a high degree of integrity or availability in another context? How about traceability?
  • if combined with data from other services, could this data be sensitive? (e.g. the example of the license plate number together with a time and geolocation)

In microservice architecture, data is more exposed, so you might need to protect the in transit with TLS/SSL.

Logging in microservices

Log data will be scattered throughout the system, and to get a complete picture of what has happened, you need to aggregate data, but fetching it manually quickly becomes a painful experience.

To effectively aggregate data, you need to store it in a normalized, structured format (e.g. JSON), which means it needs to be transformed somewhere in the logging process.

However, the upside to this, ironically, is also its downside. By having a normalization step, you encourage a design with great flexibility in terms of logging, but it opens up logging of unchecked strings as well, and that’s a security concern.

When normalizing data, you restructure it into a key-value format that, by definition, is a modification of its original form. Does that violate the integrity of the data? Not necessarily; you only need to ensure the data hasn’t changed in a unauthorized way. But in practice, it’s hard to do.

Another solution is to structure data in each service before passing it to the logging system:

This way, you avoid using third-party normalization software.

The downside to this approach is that every microservice needs to implement explicit normalization logging, which adds complexity, but avoiding third-party dependencies also reduces complexity, so it probably evens out in the long run.

Two other aspects are also interesting from a security perspective:

  • it becomes possible to digitally sign each payload using normalizing log data in each service using a cryptographic hash function (e.g. SHA-256)
  • normalization is often tightly coupled with categorization of data, which requires extensive domain knowledge, so the natural place for this isn’t in a common normalization step, but rather within each service

Traceability in log data

  • a service must be uniquely identifiable by its name, version number and instance ID
    • make sure to add the service name, version number and a unique instance ID in your digital signature of a log statement, otherwise, you can’t tell if the origin of the data has been tampered with
  • a transaction must be traceable across systems
    • this way, you can easily identify all services that participated in a transaction spanning several systems

Confidentiality through a domain-oriented logger API

Logging data with different logging levels, like DEBUG, INFO and FATAL, is common design pattern used in many systems. However, this diversity implies that all log entries marked as INFO must have restricted access because they can contain sensitive information, a confidentiality problem you don’t want.

A better solution is to treat logging as a separate view of the system that needs explicit design, similar to what you’d do for a user interface, but instead of a user, the consumer of the API is an automated analysis tool, developer or some other party interested in how the system behaves. This means that structure and categorization of data need to be considered, but so does sensitive information.

Classification depends on context and is an overall business concern. This implies that classification of data requires extensive business domain knowledge and should be part of your service design, not something you delegate to a third-party application.

Example:

public Result cancel(BookingId bookingId, User user) {
  // logs that the booking is about to be canceled
  logger.cancelBooking(bookingId, user);
 
  Result result = bookingsRepository.cancel(bookingId);
  if (result.isBookingCanceled()) {
    // logs that the booking has been canceled
    logger.bookingCanceled(bookingId, user);
  } else {
    // logs that the cancel booking operation has failed
    logging.bookingCancellationFailed(bookingId, result, user);
  }
}

The main upside to the logger API is that it guides developers in what data they need in each step of the process. This minimizes the risk of logging incorrect data, but it also separates data in terms of confidentiality.

public void bookingCancellationFailed(
  BookingId bookingId,
  Result result,
  User user
) {
  // sends audit data to the logging system
  logger.log(auditData(id, result, user));
  // send behavior data to the logging system
  logger.log(behaviorData(result));
 
  if (result.isError()) {
    // sends error data to the logging system
    logger.log(errorData(result));
  }
}

Only accepting strings in the logger API does indeed make sens because how you distinguish between audit, behavior and error data is specific to your domain.

private String auditData(...) {
  Map<String, String> data = new HashMap<>();
  data.put("category", "audit");
  data.put("message", "Failed to cancel booking");
  data.put("bookingId", bookingId);
  // ...
  return asJson(data, "Failure translating audit data into JSON");
}
 
private String asJson(Map<String, String> data, String errorMessage) {
  try {
    return objectMapper.writeValueAsString(data);
  } catch (JsonProcessingException e) {
    return format("{\"failure\":\"%s\"}", errorMessage);
  }
}

You will get something like:

{
  "category": "audit",
  "message": "Failed to cancel booking",
  "bookingId": "1234",
  "username": "foobar",
  "status": "Already checked out"
}
{
  "category": "behavior",
  "message": "Failed to cancel booking",
  "status": "Already checked out"
}

Aggregation of data into a view is then a product of your access rights, for example, audit data and error data could be shown in the same view if you’re granted access to both.

But this flexibility also results in a drawback that makes the solution unviable. By storing everything in the same master log and allowing categories to be intermixed, you open up the possibility of leaking sensitive data in a view, hence violating the confidentiality requirement.

Warning

Never intermix sensitive and nonsensitive data in the same log, because it can lead to accidental information leakage in a a view.

A better alternative is to have each category stored as separate log streams.