Writing a circuit breaker in Go
Besides retries, circuit breakers1 are probably one of the most commonly employed resilience patterns in distributed systems. While writing a retry routine is pretty simple, implementing a circuit breaker needs a little bit of work.
I realized that I usually just go for off-the-shelf libraries for circuit breaking and haven’t written one from scratch before. So, this is an attempt to create a sloppy one in Go. I picked Go instead of Python because I didn’t want to deal with sync-async idiosyncrasies or abstract things away under a soup of decorators.
Circuit breakers
A circuit breaker acts like an automatic switch that prevents your application from repeatedly trying to execute an operation that’s likely to fail. In a distributed system, you don’t want to bombard a remote service when it’s already failing, and circuit breakers prevent that.
It has three states: Closed, Open, and Half-Open. Here’s a diagram that shows the state transitions:
stateDiagram-v2 \[\*\] --> Closed: Start Closed --> Open: Failure threshold reached Open --> HalfOpen: Recovery period expired HalfOpen --> Closed: Success threshold reached HalfOpen --> Open: Request failed note right of Closed: All requests are allowed note right of Open: Requests are blocked note right of HalfOpen: Limited requests allowed to check recovery
- Closed: This is the healthy operating state where all requests are allowed to pass through to the service. If a certain number of consecutive requests fail (reaching a failure threshold), the circuit breaker switches to the Open state.
- Open: In this state, all requests are immediately blocked, and an error is returned to the caller without attempting to contact the failing service. This prevents overwhelming the service and gives it time to recover. After a predefined recovery period, the circuit breaker transitions to the Half-Open state.
- Half-Open: The circuit breaker allows a limited number of test requests to see if the service has recovered. If these requests succeed, it transitions back to the Closed state. If any of them fail, it goes back to the Open state.
Building one in Go
Here’s a simple circuit breaker in Go.
Defining states
First, we’ll define the constants for our states and create the circuitBreaker
struct, which holds all the configurable knobs.
This struct includes:
mu
: A mutex to ensure thread-safe access to the circuit breaker.state
: The current state of the circuit breaker (Closed
,Open
, orHalfOpen
).failureCount
: The current count of consecutive failures.lastFailureTime
: The timestamp of the last failure.halfOpenSuccessCount
: The number of successful requests in theHalfOpen
state.failureThreshold
: The number of consecutive failures allowed before opening the circuit.recoveryTime
: The cool-down period before the circuit breaker transitions fromOpen
toHalfOpen
.halfOpenMaxRequests
: The maximum number of successful requests needed to close the circuit.timeout
: The maximum duration to wait for a request to complete.
Initializing the breaker
Next, we provide a constructor function to initialize a new circuitBreaker
instance.
This function sets the initial state to Closed
and initializes the thresholds and timeout.
Implementing the Call method
The Call
method is the primary interface for executing functions through the circuit breaker. It dispatches the appropriate state handler based on the current state.
We use a mutex to protect against concurrent access since the circuit breaker might be used by multiple goroutines. The Call
method uses a switch statement to delegate the function call to the appropriate handler based on the current state.
Handling closed states
In the Closed
state, all requests are allowed to pass through. We monitor the requests for failures to decide when to trip the circuit breaker.
In this function:
- We attempt to execute the provided function
fn
usingrunWithTimeout
to handle possible timeouts. - If the function call fails, we increment the
failureCount
and updatelastFailureTime
. - If the
failureCount
reaches thefailureThreshold
, we transition the circuit to theOpen
state. - If the function call succeeds, we reset the circuit breaker to the
Closed
state by callingresetCircuit
.
Resetting the breaker
When a request succeeds, we reset the failure count and keep the circuit in the Closed
state.
Handling open states
In the Open
state, all requests are blocked to prevent further strain on the failing service. We check if the recovery period has expired before transitioning to the HalfOpen
state.
Here:
- We check if the recovery period (
recoveryTime
) has passed since the last failure. - If it has, we transition to the
HalfOpen
state and reset the counters. - If not, we block the request and return an error immediately.
Handling half-open states
In the HalfOpen
state, we allow a limited number of requests to test if the service has recovered.
In this function:
- We attempt to execute the provided function
fn
. - If the function call fails, we transition back to the
Open
state. - If the function call succeeds, we increment
halfOpenSuccessCount
. - Once the success count reaches
halfOpenMaxRequests
, we reset the circuit breaker to theClosed
state.
Running functions with timeout
To prevent the circuit breaker from hanging on slow or unresponsive functions, we implement a timeout mechanism. You probably noticed that inside each state handler we called the wrapped functions with runWithTimeout
.
This function:
- Creates a context with a timeout using
context.WithTimeout
. - Executes the provided function
fn
in a separate goroutine. - Waits for either the result or the timeout.
- Returns an error if the function takes longer than the specified timeout.
Taking it for a spin
Let’s test our circuit breaker with an unreliable service that sometimes fails.
In the main
function, we’ll create a circuit breaker and make several calls to the unreliable service.
This loop simulates multiple service calls, using the circuit breaker to handle failures and transitions between states.
This prints:
2024/10/06 17:24:27 INFO Making a request state=closed
2024/10/06 17:24:27 INFO Request succeeded in closed state
2024/10/06 17:24:27 INFO Circuit reset to closed state
2024/10/06 17:24:27 INFO Service request succeeded result=42
2024/10/06 17:24:28 -----------------------------------------------
2024/10/06 17:24:28 INFO Making a request state=closed
2024/10/06 17:24:28 WARN Request failed in closed state failureCount=1
2024/10/06 17:24:28 ERROR Service request failed error="service failed"
2024/10/06 17:24:29 -----------------------------------------------
2024/10/06 17:24:29 INFO Making a request state=closed
2024/10/06 17:24:29 INFO Request succeeded in closed state
2024/10/06 17:24:29 INFO Circuit reset to closed state
2024/10/06 17:24:29 INFO Service request succeeded result=42
2024/10/06 17:24:30 -----------------------------------------------
2024/10/06 17:24:30 INFO Making a request state=closed
2024/10/06 17:24:30 WARN Request failed in closed state failureCount=1
2024/10/06 17:24:30 ERROR Service request failed error="service failed"
2024/10/06 17:24:31 -----------------------------------------------
2024/10/06 17:24:31 INFO Making a request state=closed
2024/10/06 17:24:31 INFO Request succeeded in closed state
2024/10/06 17:24:31 INFO Circuit reset to closed state
2024/10/06 17:24:31 INFO Service request succeeded result=42
2024/10/06 17:24:32 -----------------------------------------------
The log messages will give you a sense of what’s happening when we retry an intermittently failing function wrapped in a circuit breaker.
The API could be better
One limitation of Go generics is that you can’t use type parameters with methods that have a receiver. This means you can’t define a method like func (cb *CircuitBreaker[T]) Call(fn func() (T, error)) (T, error)
.
For this, we have to use workarounds such as using any
(an alias for interface{}
) as the return type in our function signatures. While this sacrifices some type safety, it allows us to create a flexible circuit breaker that can handle functions returning different types.
Handling incompatible function signatures
What if the function you want to wrap doesn’t match the func() (any, error)
signature? You can easily adapt it by wrapping your function to fit the required signature.
Suppose you have a function like this:
You can wrap it like this:
Now, wrappedFunc
matches the func() (any, error)
signature and can be used with our circuit breaker.
Here’s the complete implementation2 with tests.
Recent posts
- Discovering direnv
- Notes on building event-driven systems
- Bash namerefs for dynamic variable referencing
- Behind the blog
- Shell redirection syntax soup
- Shades of testing HTTP requests in Python
- Taming parametrize with pytest.param
- HTTP requests via /dev/tcp
- Log context propagation in Python ASGI apps
- Please don’t hijack my Python root logger