Migrating to Microservices: A Strategic Engineering Approach

The Migration That Almost Killed Our Team

In 2023, my CTO announced: "We're migrating to microservices. It's the future." He'd just come back from a conference where every talk was about Netflix and Uber's microservice architectures.

Six months later, we had 15 microservices, 3 databases, a message queue we barely understood, and deployments that took 4 hours instead of 20 minutes. Our velocity crashed. Engineers were frustrated. Production was unstable.

We almost gave up and went back to the monolith. But we didn't. We fixed our approach. And we learned that microservices aren't about following trends—they're about solving specific problems.

Let me save you from our mistakes.

Why We Decided to Migrate (And Why You Might Not Need To)

Here were our actual problems:

Scaling bottleneck: Our image processing feature needed 10x more CPU than the rest of the app. We had to scale the entire monolith just for one feature.
Deployment risk: Every deploy was scary because one bug in reporting could break checkout (they shared the same codebase).
Team conflicts: 12 engineers pushing to one repo. Merge conflicts daily. PR reviews took forever.
Technology constraints: We wanted to use Python for ML features, but our monolith was Ruby.

Here's what we DIDN'T have:

"Microservices are best practice" (not a reason)
"Our competitors use microservices" (not a reason)
"It'll look good on my resume" (definitely not a reason)

Real talk: If you don't have the problems we had, stick with your monolith. Microservices add complexity. If that complexity doesn't solve a real problem, you just made your life harder.

How We Should Have Started (And Eventually Did)

Phase 1: Make the Monolith Modular (3 months)

Before extracting anything, we organized the monolith into clear modules:

/app
  /users          # Authentication, profiles
  /products       # Catalog, inventory
  /orders         # Checkout, order management
  /images         # Image processing (our scaling bottleneck)
  /analytics      # Reporting

We added linting rules: users module can't import from orders. analytics can read from anything, but nothing imports from analytics.

This took 3 months. Zero features shipped. Just organization.

The CEO hated it. "Why aren't we shipping features?"

But it was necessary. If you can't modularize your monolith, you won't succeed with microservices. You'll just create a distributed mess.

Phase 2: Extract the Obvious Win (1 month)

Our first extraction: the image processing service.

Why this one first?

Clear boundaries (input: image URL, output: processed image)
Independent scaling needs (CPU-heavy, rest of app was I/O-heavy)
Stable interface (image processing logic doesn't change often)
Non-critical (if it went down, main app still worked, images just weren't processed)

We built it in Python (better image libraries). Deployed it separately. The monolith called it via HTTP.

Result: Image processing got 10x faster. We scaled it independently. Main app stayed fast. Everyone was happy.

This success got us buy-in for more migrations.

Phase 3: Extract One Service Per Quarter (18 months)

After the image service success, we extracted:

Email service (Q2): Sending 1M emails/day, needed different scaling than main app
Search service (Q3): Elasticsearch-based, completely independent from main database
Notifications service (Q4): Push notifications, SMS, in-app—owned by separate team
Analytics service (Q1 next year): Read-only, could use database replicas

One service per quarter. Slow, deliberate, measured.

Phase 4: Stop Extracting (Month 18 - Present)

After 6 services, we stopped. The monolith still handles:

User authentication
Product catalog
Order management
Payment processing

Why not extract these?

Because they're tightly coupled, change together, and benefit from being in one codebase. Splitting them would add network calls, distributed transactions, and complexity for zero benefit.

We're not "done migrating." We're done extracting what made sense to extract.

The Extraction Process (Step by Step)

Here's exactly how we extracted each service:

Step 1: Create a Module Boundary in the Monolith

Before extracting the email service, we created EmailService class in the monolith:

class EmailService:
    def send_email(to, subject, body):
        # All email logic here
        pass

# Everywhere in codebase:
EmailService.send_email("user@example.com", "Welcome", "...")

Took 2 weeks. Changed zero behavior. Just created a clear API.

Step 2: Build the External Service (With Feature Flag)

Built a new service that implemented the same interface:

# In monolith:
class EmailService:
    def send_email(to, subject, body):
        if feature_flag('external_email_service'):
            # Call external service via HTTP
            requests.post('http://email-service/send', json={...})
        else:
            # Old implementation
            smtp.sendmail(...)

Now we could:

Route 1% of traffic to new service (test in production)
Roll back instantly if issues (just toggle flag)
Gradually increase % over weeks

Step 3: Monitor Everything

We tracked:

Latency (old implementation vs. new service)
Error rates
Success rates (did emails actually send?)
Cost (was the new service more expensive?)

Caught 4 bugs during the 0-100% rollout. Because we could instantly switch back, users barely noticed.

Step 4: Delete the Old Code

After 1 month at 100%, we deleted the old implementation from the monolith.

This is the step everyone forgets. If you don't delete old code, you now maintain two implementations forever.

Mistakes That Cost Us Weeks

Mistake 1: Shared Database

Our first microservice (image processing) directly queried the main database for user data.

Problem: When we changed the users table schema, the microservice broke. We had tight coupling through the database.

Fix: Microservices communicate via APIs, not direct database access. Each service owns its data.

Mistake 2: Too Chatty

We extracted a "Product Service." To show one product page, the frontend made:

1 call to Product Service (get product details)
1 call to Inventory Service (check stock)
1 call to Reviews Service (get reviews)
1 call to Recommendations Service (related products)

4 network calls instead of 1 database query. Page load went from 50ms to 300ms.

Fix: Created a BFF (Backend For Frontend) that aggregated calls. Frontend makes 1 call, BFF makes 4 calls in parallel, returns combined response.

Mistake 3: Wrong Boundaries

We split "Users" from "Authentication." Made sense logically.

Problem: Every auth change required a user change. We were deploying both services together anyway.

Fix: Merged them back. Boundaries should be based on how things change together, not logical categorization.

Mistake 4: No Distributed Tracing

A request was slow. Where? No idea. It hit 6 services. Which one was slow?

Fix: Added distributed tracing (OpenTelemetry). Now we can see exactly where time is spent across services.

Infrastructure Changes You Need

Microservices forced us to level up our infrastructure:

1. Service Discovery

How does Service A find Service B? Can't hardcode IPs.

We use Kubernetes built-in service discovery. Service A just calls http://service-b, Kubernetes routes it.

2. API Gateway

Instead of clients calling 10 services, they call 1 API gateway. It routes to the right service.

Also handles: auth, rate limiting, logging.

3. Centralized Logging

Logs from 15 services go to one place (we use Elasticsearch). Otherwise, debugging is impossible.

4. Monitoring & Alerting

15 services = 15 things that can break. We monitor:

Health checks (is service up?)
Latency (p50, p95, p99)
Error rates
Resource usage (CPU, memory)

Alert if any service crosses thresholds.

5. CI/CD Per Service

Each service has its own deployment pipeline. Email service can deploy without touching Product service.

Team Organization Changed

With microservices, team structure had to change:

Before (Monolith):

Frontend team
Backend team
DevOps team

After (Microservices):

Checkout team (owns order service, cart service)
Search team (owns search service, recommendations service)
Platform team (owns auth, users, infra)

Each team owns multiple services end-to-end. Full responsibility: code, deploy, monitor, fix.

This is Amazon's "two-pizza team" model. If a team can't be fed with two pizzas, it's too big.

When NOT to Migrate

After doing this, here's when I'd say "don't migrate":

Team < 5 engineers: Overhead of microservices isn't worth it
Low traffic: If your monolith handles current load fine, why add complexity?
Tight deadlines: Migration slows feature development for months
Immature DevOps: If you can't deploy the monolith reliably, microservices will be worse
Unclear boundaries: If you don't know how to split your app, don't guess

Real Numbers from Our Migration

Here's what actually happened:

Costs went up:

Infrastructure: +40% (more servers, more services)
Operational complexity: +200% (15 things to monitor vs. 1)
Development time: +30% (network calls, distributed debugging)

Benefits we got:

Deployment frequency: 3x (services deploy independently)
Scaling efficiency: 60% cost reduction (scale only what needs scaling)
Team autonomy: 5/5 (teams own their services fully)
Technology flexibility: Can use Python, Go, Node in different services
Blast radius: Small (one service breaks, others stay up)

Was it worth it? For us, yes. For many teams, probably not.

What to Do This Week

Identify your actual problems. Write them down. Are they solved by microservices, or something else?
Modularize your monolith first. Create clear boundaries. Enforce them with linting.
Identify one extraction candidate. Something with clear boundaries, independent scaling needs, and low risk.
Estimate the cost. Infrastructure, team time, complexity. Is the benefit worth it?
If yes, start small. One service. Prove the value. Then decide if you want more.

The Honest Truth

Microservices are not a "better" architecture. They're a different set of tradeoffs.

You trade:

Simplicity → Complexity
Monolith deployment → Independent service deployment
In-process calls → Network calls
One thing to monitor → Many things to monitor

For some teams, this trade is worth it. For others, it's not.

We're happy with our hybrid: 6 microservices + 1 modular monolith. It solves our problems without going full distributed.

Don't migrate because it's trendy. Migrate because it solves a problem you actually have.

And if you're not sure? Wait. The monolith isn't going anywhere.