From Monolith to Microservices: A Pragmatic Approach

The decision to break up a monolith is one of the most consequential architectural choices a team can make. Done right, it can unlock scalability and team velocity. Done wrong, it can cripple your organization with complexity. Here's our pragmatic approach based on real-world migrations.

The Monolith Isn't Your Enemy

Let's start with an unpopular opinion: **monoliths aren't inherently bad**. Many successful companies run monoliths that serve millions of users. Shopify, GitHub, and Stack Overflow all started (and in some cases continue) as monoliths.

Consider microservices when you experience these specific pain points:

1. **Team Scaling Issues:** Multiple teams can't work independently; deployments require coordination

2. **Performance Bottlenecks:** Can't scale different parts of the system independently

3. **Technology Lock-in:** Want to use different tech stacks for different components

4. **Deployment Risk:** Changes to one part of the system risk the entire application

If you don't have these problems, you probably don't need microservices.

Our Migration Philosophy

When we do migrate, we follow these principles:

1. Incremental, Not Big Bang

Never attempt a complete rewrite. We extract services incrementally, keeping the monolith functional throughout. This allows continuous delivery and reduces risk.

2. Business Logic First, Infrastructure Later

Focus on logical boundaries (e.g., "user service," "payment service") before worrying about infrastructure (Kubernetes, service mesh, etc.). You can run microservices on a single server initially.

3. Start with the Edges

Extract services at the periphery first—components with fewer dependencies. Authentication, notifications, and file storage are often good starting points.

4. Data is the Hard Part

The hardest aspect of microservices is data management. Each service should own its data, but achieving this without violating consistency requirements is challenging.

A Real Migration: E-commerce Platform

Let's walk through a real migration we executed for an e-commerce client.

The Starting Point

A Rails monolith with:

350K lines of code

6 teams sharing the codebase

20-minute test suite

Weekly deployments (risky and stressful)

12GB PostgreSQL database

Step 1: Identify Service Boundaries

We used Domain-Driven Design to identify bounded contexts:

**User Service:** Authentication, profiles, permissions

**Catalog Service:** Products, categories, search

**Cart Service:** Shopping cart, wishlists

**Order Service:** Checkout, payments, order management

**Fulfillment Service:** Inventory, shipping, tracking

**Notification Service:** Email, SMS, push notifications

Step 2: Measure Everything

Before making any changes, we established baseline metrics:

Request latency (p50, p95, p99)

Error rates

Deployment frequency

Lead time for changes

Mean time to recovery

Step 3: Extract Notification Service

We started with notifications—relatively isolated with few dependencies.

Strategy:

1. Create new Node.js service with API for sending notifications

2. Database: Separate PostgreSQL database for notification logs

3. Routing: Monolith calls notification service via HTTP initially

4. Async: Later moved to event-driven (RabbitMQ) for better decoupling

Results:

No user-facing issues during migration

Notifications team could now deploy independently

Reduced monolith complexity by ~8K lines of code

Step 4: Extract User Service

More complex due to pervasive dependencies (many parts of the app reference user data).

Strategy:

1. Create user service with API for authentication and profile operations

2. Database: Initially kept user table in shared database (pragmatic compromise)

3. API Gateway: Introduced Kong for routing and JWT validation

4. Gradual cutover: Started with new user registrations, then migrated existing users

The Data Problem:

Other services still needed user data (name, email, etc.). Options:

Option A: Service-to-Service Calls

Pro: Single source of truth

Con: Increased latency, tighter coupling

Option B: Data Replication

Pro: Fast reads, loose coupling

Con: Eventual consistency, complexity

We chose a hybrid: critical operations use synchronous calls; read-heavy displays use replicated data updated via events.

Results:

User team could deploy 3× more frequently

Authentication performance improved 40% (dedicated caching)

Some complexity added (event handling, data sync logic)

Step 5: Extract Catalog Service

Product catalog had massive read traffic but infrequent writes.

Strategy:

1. New service built with Elixir (better performance for read-heavy workloads)

2. PostgreSQL database, replicated from monolith initially

3. GraphQL API for flexible queries

4. Aggressive caching with Redis

Results:

Catalog API latency reduced 65%

Could scale reads independently of writes

Enabled product search improvements (Algolia integration)

Step 6-8: Cart, Orders, Fulfillment

Similar pattern: identify boundaries, extract incrementally, establish communication patterns.

The Good, The Bad, The Ugly

The Good

✅ **Deployment Independence:** Teams deploy multiple times per day without coordination

✅ **Technology Diversity:** Using the right tool for each job (Go for performance, Python for ML)

✅ **Scalability:** Can scale services independently based on traffic patterns

✅ **Team Autonomy:** Teams own their services end-to-end

The Bad

⚠️ **Operational Complexity:** More monitoring, logging, tracing, and debugging surface area

⚠️ **Network Calls:** Latency and failure modes from inter-service communication

⚠️ **Data Consistency:** Distributed transactions are hard; eventual consistency requires careful design

⚠️ **Testing:** Integration testing across services is more complex

The Ugly

❌ **Initial Slowdown:** First 6 months, velocity decreased as team adapted to new architecture

❌ **Organizational Change:** Required changes to team structure, on-call, and ownership models

❌ **Tooling Investment:** Had to build/buy better observability, service discovery, and deployment tools

Lessons Learned

1. Conway's Law is Real

Your architecture will mirror your organization structure. Align team boundaries with service boundaries.

2. Observability is Not Optional

Distributed tracing (we use Datadog), centralized logging (Elasticsearch), and comprehensive metrics are essential. Build this before you need it.

3. Service Mesh Isn't Always Needed

We used Istio for a while, then removed it. For many teams, a good API gateway + library-based patterns are simpler and sufficient.

4. Database per Service is the Goal, Not Day 1 Reality

Start with logical separation, move to physical separation over time. Pragmatism > dogma.

5. Events > HTTP Calls

For non-critical flows, event-driven architecture reduces coupling and improves resilience. We use RabbitMQ for async workflows.

6. Start with a Monolith, Extract Services When Needed

If we were starting from scratch today, we'd build a modular monolith with clear boundaries, then extract services only when necessary.

When to Stay Monolithic

Stay with a modular monolith if:

Team is <20 engineers

Deployment frequency is acceptable

Performance meets requirements

You don't have deep expertise in distributed systems

A well-structured monolith with good modularity can take you very far.

Conclusion

Microservices aren't a silver bullet. They solve specific problems at the cost of increased complexity. Our advice:

1. Start with a monolith

2. Keep it modular with clear boundaries

3. Extract services only when you hit specific scaling pain points

4. Be incremental and measure everything

5. Invest heavily in tooling and observability

The best architecture is the one that solves your actual problems without introducing unnecessary complexity.

From Monolith to Microservices: A Pragmatic Approach

From Monolith to Microservices: A Pragmatic Approach

The Monolith Isn't Your Enemy

Our Migration Philosophy

1. Incremental, Not Big Bang

2. Business Logic First, Infrastructure Later

3. Start with the Edges

4. Data is the Hard Part

A Real Migration: E-commerce Platform

The Starting Point

Step 1: Identify Service Boundaries

Step 2: Measure Everything

Step 3: Extract Notification Service

Step 4: Extract User Service

Step 5: Extract Catalog Service

Step 6-8: Cart, Orders, Fulfillment

The Good, The Bad, The Ugly

The Good

The Bad

The Ugly

Lessons Learned

1. Conway's Law is Real

2. Observability is Not Optional

3. Service Mesh Isn't Always Needed

4. Database per Service is the Goal, Not Day 1 Reality

5. Events > HTTP Calls

6. Start with a Monolith, Extract Services When Needed

When to Stay Monolithic

Conclusion

Engineering Team