Testing Strategies for Microservices Architectures

Testing Strategies for Microservices Architectures

The Test That Caught a $100K Bug

Last year, we deployed a microservice that processed payments. It worked perfectly in testing. In production? It failed spectacularly when two services tried to process the same payment simultaneously.

Cost: $100K in double charges we had to refund. Plus angry customers. Plus a very angry CEO.

The problem? We tested each service individually. We never tested how they worked together. That's the microservices testing trap.

The Three Layers of Testing

In a monolith, you have unit tests and integration tests. Done.

In microservices, you need:

1. Unit Tests (Test One Function)

Same as always. Test your business logic in isolation.

def test_calculate_discount():
    order = Order(total=100, user_tier="premium")
    assert order.calculate_discount() == 10

Fast. Reliable. No network calls. No databases. Just pure logic.

2. Integration Tests (Test One Service)

Test a single service with its real dependencies (database, cache, etc.).

def test_create_order_endpoint():
    response = client.post('/orders', json={'items': [...]})
    assert response.status_code == 201
    assert db.orders.count() == 1  # Check it hit the database

Tests that your service actually works as a cohesive unit.

3. Contract Tests (Test Service Boundaries)

This is the new one. Test that services communicate correctly.

Scenario: Order Service calls Payment Service.

Order Service expects Payment Service to accept:

POST /payments { "amount": 100, "currency": "USD" }

Contract test verifies: "Does Payment Service actually accept this format?"

If Payment Service changes its API, the contract test breaks BEFORE you deploy.

We use Pact for this. Saved us multiple times.

4. End-to-End Tests (Test Everything Together)

Simulate a real user flow across all services.

"User adds item to cart → checks out → gets charged → receives email confirmation."

Slow. Flaky. Expensive. But critical for catching the bugs that only show up when everything runs together.

What We Learned the Hard Way

Lesson 1: Don't Mock Everything

Early on, we mocked all external calls in tests. Tests were fast! And useless.

They tested that our mocks worked, not that our code worked.

Now: Unit tests can mock. Integration tests use real dependencies (database, cache). Contract tests hit real APIs (or test doubles).

Lesson 2: Test Failure Modes

Happy path is easy. What happens when Payment Service is down? When the database is slow? When you get malformed data?

We now test:

  • Service timeouts
  • Network failures
  • Partial failures (3 of 5 services respond)
  • Retries and circuit breakers

These tests caught that double-charge bug in staging after we added them.

Lesson 3: Test in Production (Yes, Really)

We deploy with feature flags. New code runs for 1% of traffic. We monitor error rates. If they spike, we roll back.

This IS testing. Just in production, with real data, real load.

Start Here

  1. Add contract tests between your two most-coupled services
  2. Add one chaos test (kill a service, see if the system survives)
  3. Monitor error rates in production as your real E2E test

Testing microservices is harder. But catching bugs before customers do? Worth it.