paradigm shift for microservice architectures with Temporal

Introduction.

One of the biggest challenges in modern micro-service architectures is managing downstream errors in complicated, multi-micro-service systems.

Let me show you what I mean; consider the following illustration:

Illustration of micro-service choreography

  1. A customer places an order via a web UI; the order details are sent to the “order” service.
  2. The “order” service sends the details to the “inventory” service, which checks that we have enough product X to complete the order.
  3. If we have stock, we deplete it and send the order details to the “payment” service, which attempts to charge for the order.
  4. Once we have payment, the “fulfilment” service prints a label in the warehouse so an employee can box up the product and hand it over to the delivery company.
  5. The delivery company provide a tracking number, which an employee enters into the system via a UI and is stored in the “shipment” service.

Usually, this works fine, and it’s super fast, but let’s throw a spanner in the works; let’s say, for some reason, a product can no longer be shipped; let’s say the local government passes a law to ban the product in their area (this actually happens).

How does this play out in our system?

  1. So, the delivery company spots the issue in their system (they really don’t want to break the law) and refuses the shipment.
  2. Our employee enters the shipping error in their web UI, which updates the “shipment” service.
  3. The shipment services must tell the “fulfilment” service that the order can’t be fulfilled.
  4. The “fulfilment” service tells the “payment” service to issue a refund.
  5. Then, the “payment” service can tell the “inventory” service to return the product to stock.
  6. Finally, the “inventory service” tells the “order service” that the order has been cancelled and a refund has been issued.

Nice, we handled that pretty well, pats on the backs.

… But what if there is an issue refunding the order in the “payment” service?

We will keep trying, and hopefully, it’s recoverable because now the system is in a limbo state; the inventory and order services still think we have products going out to customers. If the customer were to check the UI, they’d see that payment was collected, but the package still needs to be shipped!

Now, this works; plenty of implementations like this are out there. However, it pushes the burden of monitoring the state of transactions onto the development (or support) team.

Can we make this less vulnerable to faults?

Orchestration.

One way to skin this cat and make our system more fault-tolerant would be to use orchestration; let’s rework our system by adding a micro-service orchestrator:

Illustration of micro-service orchestration.

Rather than one service talking to the next, as in our previous choreographed example, all our services “talk” to a centralised service that manages the states between each transaction.

Now, when something bad happens, we don’t need to retrace our steps backwards through the micro-services; the orchestrator undoes everything for us.

Temporal.

Our orchestrator has created a workflow linking our micro-services and holding the state between each transaction.

Awesome, but that was a lot of work; wouldn’t it be nice to have a stable, open-source framework to help us?

Enter Temporal;

Temporal (https://Temporal.io) does just that; it provides a way to develop durable workflows, segmenting your business login into “activities” and letting the developer control the state during the flow.

Plenty of other good workflow engines are out there, but we like Temporal because the workflows are defined as code, not some DSL or diagram; this makes it super easy to version control, peer review and maintain.

Now, we can take our current micro-service architecture and model our workflow around it, like the following (super simplified) diagram:

The paradigm shift.

So far, this has all seemed quite reasonable; this is a journey architects walk all the time - we’ve traded some performance for reliability and fault tolerance, great.

But here’s the thing: Now we are using Temporal, we are no longer focused on writing micro-services; we are developing workflows, and a workflow doesn’t care where it runs.

Sure, we can still divide our workflows along domain lines; after all, we spent all that time event-storming.

But our focus has shifted to a business process view, and that’s quite powerful because it better aligns technology with business groups, so when the business evolves its processes, we can grow right along with them by altering our workflows, adding a new activity or child workflows.

Temporal becomes really powerful when you place the logic inside activities or child workflows rather than just calling out to micro-services, which, given our previous example, might look something like this:

How do?

Here is the fun part: how do you get there?

My process is steeped in architectural methods, and yours might not look as formal as mine; that’s totally fine. I make a big fuss over architecture; it’s in the job title.

  1. Event storming and (or) domain-storytelling sessions are great techniques; choose the one that fits your organisation and style. This is how we define our “domains” and document the business workflows.
  2. Design and document your workflows; I like to use BPMN, but UML works too. https://app.diagrams.net/ is excellent for creating BPMN diagrams, and https://cawemo.com/ is even better.
  3. Define the deployment process with DevOps; it shouldn’t be very different to your current scheme, but let’s get those Jenkins pipelines, Git actions and Docker containers created.
  4. Take a moment to create some C4 model diagrams and document what the system will look like; this will come in handy when presenting your awesome new system to leadership. I love using PlantUML for my diagrams (architecture as code) where possible.
  5. Stub out the workers, workflows and activities; this is specific to my method. I give my engineers a starting point, hold their hands until MVP, and then hand the project over to them. Not all architects like to be so hands-on (arguably, that’s the correct position).
  6. Hand off the workflows/workers to the engineering teams.
  7. Profit - there is a bunch of follow-up, quarterly reviews, etc., but you get the idea.

Conclusion.

Just spreading the word about Temporal and how you can use it to orchestrate your micro-services, or not, you do you.