The Circuit Breaker Pattern in Microservices - 18/11/2024

The Circuit Breaker Pattern: Stop Failing Requests From Taking Down Your System

You’ve got Service A calling Service B. Service B starts timing out. Now Service A is holding connections open, waiting. Its thread pool fills up. Requests to Service A start failing too—even the ones that don’t touch Service B. Congratulations, you’ve got a cascading failure.

The circuit breaker pattern exists to stop this.

The Core Idea

A circuit breaker wraps your external calls and tracks failures. When failures cross a threshold, it “trips” and starts failing requests immediately—no waiting for timeouts, no wasting resources on calls that won’t succeed.

It has three states:

Closed — Normal operation. Requests go through. The breaker counts failures.

Open — Too many failures. Requests fail instantly without attempting the call. After a timeout, it moves to half-open.

Half-Open — Trial mode. A few requests are allowed through. If they succeed, back to closed. If they fail, back to open.

That’s it. The rest is tuning.

Implementation

Here’s a circuit breaker in TypeScript using opossum:

import CircuitBreaker from 'opossum';

const breaker = new CircuitBreaker(
    async (url: string) => {
      const res = await fetch(url);
      if (!res.ok) throw new Error(`HTTP ${res.status}`);
      return res.json();
    },
    {
      failureThreshold: 50,      // % of failures to trip
      resetTimeout: 30_000,      // ms before trying half-open
      timeout: 5_000,            // ms before a call is considered failed
      volumeThreshold: 20,       // min requests before threshold applies
    }
);

breaker.on('open', () => console.log('Circuit opened'));
breaker.on('halfOpen', () => console.log('Circuit half-open, testing...'));
breaker.on('close', () => console.log('Circuit closed'));

// Use it
const data = await breaker.fire('https://api.example.com/data');

And in Python with pybreaker:

import pybreaker
import requests

breaker = pybreaker.CircuitBreaker(
    fail_max=5,                # failures before opening
    reset_timeout=30,          # seconds before half-open
    listeners=[pybreaker.CircuitBreakerListener()]
)

@breaker
def call_api(url: str) -> dict:
    response = requests.get(url, timeout=5)
    response.raise_for_status()
    return response.json()

# Use it
try:
    data = call_api("https://api.example.com/data")
except pybreaker.CircuitBreakerError:
    # Circuit is open, use fallback
    data = get_cached_response()

Picking Thresholds

This is where most people screw up. Here’s a starting point:

Parameter	Starting Value	Adjust When…
Failure threshold	50%	Lower if failures are expensive, higher if service is flaky
Sampling window	10-30 seconds	Shorter for fast services, longer for batch operations
Minimum throughput	20 requests	Higher for high-traffic services to avoid noise
Break duration	30 seconds	Match to typical recovery time of downstream service
Half-open test calls	3-5	More if you need higher confidence before closing

The minimum throughput matters more than people think. Without it, two failures out of three requests (66% failure rate) trips the breaker—even though three requests isn’t statistically meaningful.

What To Do When the Circuit Opens

Failing fast is only half the job. You need a fallback strategy:

Return cached data — If you cached the last good response, serve that. Stale data beats no data for many use cases.

Degrade gracefully — Can’t reach the recommendation service? Show popular items instead of personalized ones.

Queue for retry — For non-time-sensitive operations, queue the request and process it when the circuit closes.

Return a sensible default — Sometimes a hardcoded fallback is fine. Can’t reach the feature flag service? Default to the safe option.

Just fail — Sometimes there’s no good fallback. That’s okay. Fail fast with a clear error rather than timing out.

const breaker = new CircuitBreaker(fetchUserRecommendations, {
  failureThreshold: 50,
  resetTimeout: 30_000,
});

breaker.fallback(() => getPopularItems()); // Fallback when circuit is open

const recommendations = await breaker.fire(userId);

When Circuit Breakers Go Wrong

Threshold too sensitive. A brief network blip trips the breaker and you’re rejecting good requests for 30 seconds. Fix: raise minimum throughput, lengthen sampling window.

Threshold too lax. By the time the breaker trips, the damage is done—your thread pools are already exhausted. Fix: lower the failure threshold, add timeout handling.

Break duration mismatch. Your downstream service takes 5 minutes to recover but your breaker retries every 30 seconds, hammering it with test requests. Fix: use exponential backoff for break duration or monitor downstream health directly.

Testing in half-open kills recovery. You allow 3 test requests in half-open. The downstream service can handle 1 request/second during recovery. Your 3 concurrent test requests overwhelm it, it fails, circuit opens again. Fix: space out half-open test requests.

Circuit breaker per instance vs shared. If each instance of your service has its own breaker, they’ll all independently hammer the recovering downstream service. Consider sharing circuit state via Redis or a service mesh.

Circuit Breakers vs. Other Patterns

Retries — Try again on failure. Useful for transient errors. Dangerous without a circuit breaker because you’ll retry forever against a dead service.

Timeouts — Stop waiting after X seconds. Essential, but you’re still consuming resources while waiting. Circuit breakers prevent the wait entirely.

Bulkheads — Isolate resources per dependency. If Service B is slow, it only exhausts its dedicated thread pool, not your entire application. Complementary to circuit breakers.

Rate limiting — Control how many requests you send. Protects the downstream service from you. Circuit breakers protect you from the downstream service.

Use them together:

Retry → Circuit Breaker → Timeout → Bulkhead → Actual Call

Should You Even Use One?

Circuit breakers add complexity. You need monitoring to see their state. You need fallbacks for when they’re open. You need to tune thresholds. You need to test failure scenarios.

Use a circuit breaker when:

The downstream service is out of your control
Failures in that service could cascade to your service
You have meaningful fallback behavior
The service has enough traffic to make statistical thresholds meaningful

Skip it when:

You’re calling something local and fast
There’s no fallback—if Service B is down, you’re down anyway
Traffic is too low to get meaningful failure rates
You’re already using a service mesh that handles this (Istio, Linkerd)

Integrating Without Wrapping Every Call

Nobody wants to wrap every HTTP call manually. Here are three approaches to integrate circuit breakers cleanly.

1. Wrap Your HTTP Client

Create a wrapper around your HTTP client that applies the circuit breaker automatically:

import CircuitBreaker from 'opossum';
import axios, {AxiosRequestConfig, AxiosResponse} from 'axios';

const breakers = new Map<string, CircuitBreaker>();

function getBreaker(baseURL: string): CircuitBreaker {
  if (!breakers.has(baseURL)) {
    const breaker = new CircuitBreaker(
        (config: AxiosRequestConfig) => axios(config),
        {failureThreshold: 50, resetTimeout: 30_000, timeout: 5_000}
    );
    breakers.set(baseURL, breaker);
  }
  return breakers.get(baseURL)!;
}

export const http = {
  async get<T>(url: string, config?: AxiosRequestConfig): Promise<T> {
    const base = new URL(url).origin;
    const res = await getBreaker(base).fire({...config, method: 'GET', url});
    return (res as AxiosResponse<T>).data;
  },
  async post<T>(url: string, data?: unknown, config?: AxiosRequestConfig): Promise<T> {
    const base = new URL(url).origin;
    const res = await getBreaker(base).fire({...config, method: 'POST', url, data});
    return (res as AxiosResponse<T>).data;
  },
  // ... put, delete, etc.
};

// Usage — no wrapping needed
const user = await http.get<User>('https://api.example.com/users/1');

This gives you one circuit breaker per origin, created lazily. All calls through http are protected automatically.

2. Axios Interceptors

If you’re already using axios instances, add the circuit breaker as an interceptor:

import CircuitBreaker from 'opossum';
import axios from 'axios';

export function createClient(baseURL: string) {
  const client = axios.create({baseURL, timeout: 5_000});

  const breaker = new CircuitBreaker(
      (config) => axios({...config, baseURL}),
      {failureThreshold: 50, resetTimeout: 30_000}
  );

  // Replace the request method
  const originalRequest = client.request.bind(client);
  client.request = (config) => breaker.fire(config) as Promise<any>;

  // Convenience methods still work
  return client;
}

// Usage
const paymentApi = createClient('https://payments.example.com');
const userApi = createClient('https://users.example.com');

// These are now protected
await paymentApi.post('/charge', {amount: 100});
await userApi.get('/users/1');

3. Service Classes with Dependency Injection

For larger projects, encapsulate each external service in a class:

import CircuitBreaker from 'opossum';

interface PaymentResult {
  id: string;
  status: string
}

class PaymentService {
  private breaker: CircuitBreaker;

  constructor(private baseUrl: string) {
    this.breaker = new CircuitBreaker(
        (path: string, init?: RequestInit) =>
            fetch(`${this.baseUrl}${path}`, init).then(r => {
              if (!r.ok) throw new Error(`HTTP ${r.status}`);
              return r.json();
            }),
        {failureThreshold: 50, resetTimeout: 30_000}
    );

    this.breaker.fallback(() => ({id: '', status: 'pending_retry'}));
  }

  charge(amount: number): Promise<PaymentResult> {
    return this.breaker.fire('/charge', {
      method: 'POST',
      headers: {'Content-Type': 'application/json'},
      body: JSON.stringify({amount}),
    });
  }

  refund(paymentId: string): Promise<PaymentResult> {
    return this.breaker.fire(`/refund/${paymentId}`, {method: 'POST'});
  }
}

// Wire it up in your DI container
const paymentService = new PaymentService('https://payments.example.com');

// Usage — circuit breaker is invisible to callers
await paymentService.charge(100);

This is the cleanest approach for complex projects. Each service owns its circuit breaker config, fallbacks are defined in one place, and calling code doesn’t know or care about the circuit breaker.

Observability

A circuit breaker you can’t see is useless. At minimum, track:

Current state (closed/open/half-open)
State transition events
Failure rate over time
Requests rejected due to open circuit

breaker.on('stateChange', (state) => {
  metrics.recordStateChange(state);
});

Or with pybreaker

class MetricsListener(pybreaker.CircuitBreakerListener):
    def state_change(self, breaker, old, new):
        metrics.record_state_change(old, new)

Set up alerts for state transitions. If your circuit to the payment service opens at 3am, you want to know.

Summary

Circuit breakers prevent cascading failures by failing fast when a downstream service is unhealthy. The pattern is simple—three states, threshold-based transitions—but the tuning is where the real work happens.

Start with conservative thresholds, monitor aggressively, and adjust based on real behavior. And always have a plan for what happens when the circuit opens.