System Design Fundamentals 2026: A Developer's Guide to Scalable Applications

System design interviews get all the attention, but the real value is in day-to-day decisions: should you extract that service? Add a cache? Reach for a message queue? This guide covers the fundamental patterns, their trade-offs, and the concrete decisions you'll face building production systems in 2026.
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\---
1\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\. Microservices vs Monolith vs Modular Monolith
The "monolith vs microservices" debate has matured. In 2026, the winner is often somewhere in between.
| Architecture | Team Size | Deploy Frequency | Best For | |---|---|---|---| | Monolith | 1–5 | Low | Prototypes, internal tools, MVPs | | Modular Monolith | 3–15 | Medium | Most business apps, teams that aren't Spotify-sized | | Microservices | 10+ per service | High | Large orgs with clear domain boundaries |
The Modular Monolith Sweet Spot
A modular monolith is a single deployable unit with strict module boundaries. Modules communicate through well-defined interfaces but share the same process and database.
┌─────────────────────────────────────┐
│ Modular Monolith │
│ ┌──────────┐ ┌──────────┐ │
│ │ Orders │ │ Billing │ │
│ │ Module │──│ Module │ │
│ └────┬─────┘ └────┬─────┘ │
│ │ │ │
│ ┌────▼──────────────▼─────┐ │
│ │ Shared Kernel │ │
│ │ (DB, messaging, auth) │ │
│ └─────────────────────────┘ │
└─────────────────────────────────────┘
When to extract a service : When two conditions are met — the module has a clear bounded context (DDD), and you need independent scaling or deploy velocity that the monolith can't provide.
Rule of thumb : Don't break your monolith until it hurts. Premature microservices add distributed transaction complexity, network latency, and operational overhead. Start modular, extract surgically.
Real-World Decision Tree
Monolith → Modular Monolith → Selective Extraction → Full Microservices
MVP Phase: Monolith
10k users/5 devs: Modular monolith
100k users: Extract payments (PCI scope)
1M users: Extract search (separate scale)
10M users: Extract recommendations (different stack)
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\---
2\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\. CQRS: Command Query Responsibility Segregation
CQRS separates reads from writes — different models, sometimes different databases.
When CQRS Makes Sense
-
Your read queries are complex and don't map well to your write model (e.g., reporting dashboards)
-
Your read and write workloads have different scaling requirements (10:1 read-to-write ratio)
-
You need different data shapes for reading vs writing (e.g., write normalized, read denormalized)
A Simple CQRS Implementation
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\--- Command Side (Writes) ---
class CreateOrderCommand:
def init(self, user_id: str, items: list):
self.user_id = user_id
self.items = items
class OrderCommandHandler:
def handle(self, cmd: CreateOrderCommand) -> str:
Validate business rules
order = Order.create(cmd.user_id, cmd.items)
order.save() # Write to transactional DB (PostgreSQL)
event_bus.publish("order.created", {"order_id": order.id})
return order.id
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\--- Query Side (Reads) ---
class OrderQueryHandler:
def get_order_summary(self, user_id: str) -> dict:
Read from denormalized read model (could be a different DB)
return read_db.query(
"SELECT * FROM order_summaries WHERE user_id = :uid",
{"uid": user_id}
)
CQRS Without Event Sourcing
You don't need event sourcing to use CQRS. The most common pattern is:
- Write to a normalized PostgreSQL table
2\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\. Sync (or async via CDC) to a read-optimized table 3\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\. Read from the read table
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\-- Write model: normalized
CREATE TABLE orders (
id UUID PRIMARY KEY,
user_id UUID NOT NULL,
status VARCHAR(20) NOT NULL,
total_cents BIGINT NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
);
CREATE TABLE order_items (
id UUID PRIMARY KEY,
order_id UUID REFERENCES orders(id),
product_id UUID NOT NULL,
quantity INT NOT NULL,
unit_price_cents BIGINT NOT NULL
);
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\-- Read model: denormalized for fast queries
CREATE TABLE order_summaries (
order_id UUID PRIMARY KEY,
user_id UUID NOT NULL,
status VARCHAR(20) NOT NULL,
item_count INT NOT NULL,
total_cents BIGINT NOT NULL,
product_names TEXT[] NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
);
When NOT to Use CQRS
-
Your app is a simple CRUD interface with no complex queries
-
You don't need separate read/write scaling
-
Your team is small and you can't justify the infrastructure overhead
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\---
3\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\. Event-Driven Architecture
Event-driven systems decouple producers from consumers. When an event happens, interested services react.
Core Concepts
┌──────────┐ Event Bus ┌──────────────┐
│ Producer │─────(Kafka/RMQ)────▶│ Consumer 1 │
│ (Orders) │ │ (Analytics) │
└──────────┘ └──────────────┘
───────────────▶┌──────────────┐
│ Consumer 2 │
│ (Email) │
└──────────────┘
Message Queue Comparison
| Feature | Kafka | RabbitMQ | SQS | |---|---|---|---| | Delivery | At-least-once, exactly-once (idempotent) | At-most-once, at-least-once | At-least-once | | Ordering | Per-partition guaranteed | Not guaranteed (unless single queue) | FIFO queue (limited throughput) | | Persistence | Disk-based, configurable retention | Memory + disk (lazy queues) | Automatic (up to 14 days) | | Throughput | Millions/sec | Thousands/sec | Unlimited (soft limit 300/s for FIFO) | | Consumer model | Pull-based (offset tracking) | Push or pull | Pull-based (long polling) | | Use case | Event sourcing, stream processing, logs | Task queues, RPC, work queues | Serverless workloads, simple decoupling | | Operational cost | High (requires Zookeeper/KRaft) | Medium | Zero (fully managed) |
Kafka in Practice: The Url Shortener Click Stream
Producer — emit click events
def record_click(short_code: str, ip: str, user_agent: str):
producer.send(
topic="url_clicks",
key=short_code.encode(), # Same key → same partition → ordered
value={
"short_code": short_code,
"ip": ip,
"user_agent": user_agent,
"timestamp": int(time.time()),
}
)
Consumer 1 — real-time analytics (e.g., update Redis counters)
def consume_clicks_for_analytics():
for message in consumer:
click = message.value
redis.zincrby("popular_urls:today", 1, click["short_code"])
redis.incr(f"url:{click['short_code']}:clicks")
Consumer 2 — store raw clicks in data warehouse
def consume_clicks_for_storage():
for message in consumer:
warehouse.insert_one(message.value)
Event Sourcing: Storing State as Events
Instead of storing the current state, event sourcing stores a sequence of state-changing events. The current state is derived by replaying them.
Events (immutable facts)
events = [
{"type": "AccountCreated", "data": {"user_id": "u1", "email": "a@b.com"}},
{"type": "EmailVerified", "data": {"user_id": "u1", "verified_at": "2026-05-01"}},
{"type": "PasswordChanged", "data": {"user_id": "u1", "changed_at": "2026-05-10"}},
]
Derive current state by replaying events
def get_account_state(events):
state = {"email": None, "email_verified": False, "password_hash": None}
for event in events:
if event["type"] == "AccountCreated":
state["email"] = event["data"]["email"]
elif event["type"] == "EmailVerified":
state["email_verified"] = True
return state
Trade-offs : Event sourcing gives you a complete audit trail and time travel, but makes querying awkward (you need projections) and schema evolution painful.
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\---
4\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\. Database Scaling Strategies
Read Replicas
The simplest scaling strategy: one primary handles writes, replicas handle reads.
┌─────────────┐
│ Primary DB │◀── Writes
└──────┬──────┘
│
┌────────────────┼────────────────┐
│ │ │
┌────▼─────┐ ┌─────▼────┐ ┌───────▼──┐
│ Replica 1│ │ Replica 2│ │ Replica 3│
│ (Reads) │ │ (Reads) │ │ (Reads) │
└──────────┘ └──────────┘ └──────────┘
Using read/write separation in code
class DatabaseRouter:
def init(self):
self.primary = create_engine(PRIMARY_URL)
self.replicas = [create_engine(url) for url in REPLICA_URLS]
self.replica_index = 0
def write(self, query, params=None):
with self.primary.begin() as conn:
return conn.execute(query, params or {})
def read(self, query, params=None):
Round-robin across replicas
replica = self.replicas[self.replica_index % len(self.replicas)]
self.replica_index += 1
return replica.execute(query, params or {})
Replication lag is the #1 problem. If your app reads immediately after a write (e.g., "you just placed an order" page), route that read to the primary. This is called read-after-write consistency.
async def create_order_and_redirect(user_id: str, items: list):
order_id = db.write("INSERT INTO orders ... RETURNING id")
Read-after-write: force this read to the primary
order = db.read_from_primary(
"SELECT * FROM orders WHERE id = :oid", {"oid": order_id}
)
return redirect(f"/orders/{order_id}")
Sharding (Horizontal Partitioning)
Split data across databases by a shard key.
| Strategy | Shard Key | Pros | Cons | |---|---|---|---| | Hash-based | hash(user_id) % N | Even distribution | Resharding is painful (need consistent hashing) | | Range-based | user_id 1–10000 → shard 1 | Range queries work | Hot spots possible | | Directory-based | Lookup table maps key → shard | Flexible, re-shardable | Extra lookup, single point of failure |
Consistent hashing — minimizes re-sharding
class ConsistentHashRing:
def init(self, nodes: list, replicas: int = 150):
self.ring = {}
for node in nodes:
for i in range(replicas):
key = self._hash(f"{node}:{i}")
self.ring[key] = node
self.sorted_keys = sorted(self.ring.keys())
def get_node(self, key: str) -> str:
if not self.ring:
return None
hash_val = self._hash(key)
for ring_key in self.sorted_keys:
if hash_val <= ring_key:
return self.ring[ring_key]
return self.ring[self.sorted_keys[0]]
def _hash(self, key: str) -> int:
return int(hashlib.md5(key.encode()).hexdigest(), 16)
Partitioning (Within a Database)
Split a table into smaller physical chunks. PostgreSQL declarative partitioning:
CREATE TABLE events (
event_id UUID NOT NULL,
occurred_at TIMESTAMP NOT NULL,
payload JSONB
) PARTITION BY RANGE (occurred_at);
CREATE TABLE events_2026_q1
PARTITION OF events
FOR VALUES FROM ('2026-01-01') TO ('2026-04-01');
CREATE TABLE events_2026_q2
PARTITION OF events
FOR VALUES FROM ('2026-04-01') TO ('2026-07-01');
Partition pruning means queries with WHERE occurred_at >= '2026-04-01' only scan relevant partitions.
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\---
5\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\. Caching Layers
The Three Cache Levels
CDN ─── Application Cache (Redis) ─── In-Memory Cache (Local)
│ │ │
│ Expensive to fill Fastest access
│ Shared across servers 1-5μs per get
│ 50-500μs per get Lost on restart
Cache Strategies
| Strategy | Read Behavior | Write Behavior | Best For | |---|---|---|---| | Cache Aside | Check cache → miss → load from DB → populate cache | Write to DB, invalidate cache key | Most general-purpose apps | | Read Through | Cache is authoritative; loads from DB on miss | Write through to DB; cache handles loading | When cache handles persistence | | Write Through | — | Write to cache first, then DB synchronously | Apps needing strong consistency | | Write Behind | — | Write to cache, async flush to DB | High-write-throughput apps | | Write Around | — | Write to DB only; cache populated on subsequent read | Write-once, read-rarely data |
Cache Aside — The Default Choice
async def get_user_profile(user_id: str) -> dict:
cache_key = f"user:profile:{user_id}"
1\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\. Try cache
cached = await redis.get(cache_key)
if cached:
return json.loads(cached)
2\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\. Cache miss — load from database
profile = await db.query(
"SELECT * FROM user_profiles WHERE user_id = :uid",
{"uid": user_id}
)
if profile:
3\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\. Populate cache with TTL
await redis.setex(cache_key, 300, json.dumps(profile))
return profile
async def update_user_profile(user_id: str, data: dict):
1\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\. Write to database
await db.execute(
"UPDATE user_profiles SET name = :name WHERE user_id = :uid",
{"uid": user_id, "name": data["name"]}
)
2\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\. Invalidate cache (don't update it — let next read re-populate)
await redis.delete(f"user:profile:{user_id}")
Write Behind — For High-Volume Writes
Batch writer process — runs every 5 seconds
write_buffer = []
async def write_to_cache(key: str, value: dict):
write_buffer.append((key, value))
if len(write_buffer) >= 100:
await flush_buffer()
async def flush_buffer():
async with db.transaction():
for key, value in write_buffer:
await db.execute(
"UPSERT INTO ... VALUES (:k, :v)",
{"k": key, "v": json.dumps(value)}
)
write_buffer.clear()
Start background flusher
async def periodic_flush():
while True:
await asyncio.sleep(5)
if write_buffer:
await flush_buffer()
Write behind risk : if the process crashes before the flush, data is lost. Use a persistent queue (Kafka) for critical writes.
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\---
6\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\. CAP Theorem Explained Practically
CAP says a distributed data store can provide at most two of three guarantees: Consistency , Availability , and Partition Tolerance.
What CAP Actually Means
-
C (Consistency) : Every read sees the most recent write (or an error)
-
A (Availability) : Every request gets a non-error response (not necessarily the latest data)
-
P (Partition Tolerance) : System continues working despite network failures
The Key Insight
You must choose CP or AP. Partition tolerance is non-negotiable in distributed systems — networks WILL fail.
| System | Choice | Real-World | |---|---|---| | PostgreSQL (single node) | CA | No distribution, no partition | | PostgreSQL + synchronous replication | CP | Writes wait for replicas | | Cassandra | AP | Writes always succeed, reads may be stale | | DynamoDB (eventual consistency) | AP | Default read is eventually consistent | | DynamoDB (strongly consistent) | CP | Higher latency, lower availability | | MongoDB (replica set) | CP | Writes acknowledged by majority |
Practical CAP Decisions
AP choice — accept stale reads for availability
async def get_product_stock(product_id: str) -> int:
Read from nearest replica, may be stale
return await replica.query(
"SELECT stock FROM products WHERE id = :pid",
{"pid": product_id}
)
CP choice — accept slower reads for consistency
async def get_product_stock_cp(product_id: str) -> int:
Read from primary, always latest
return await primary.query(
"SELECT stock FROM products WHERE id = :pid",
{"pid": product_id}
)
Rule of thumb : Use eventual consistency for read-heavy, non-critical data (product descriptions, view counts). Use strong consistency for financial data, inventory, and auth tokens.
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\---
7\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\. Load Balancing Strategies
Layer 4 vs Layer 7
| Aspect | Layer 4 (TCP) | Layer 7 (HTTP) | |---|---|---| | Routing based on | IP + port | URL, headers, cookies, body | | Performance | Very fast | Slower (inspects payload) | | Features | Simple forwarding | Content-based routing, rate limiting | | Examples | HAProxy (TCP mode), AWS NLB | NGINX, Envoy, AWS ALB |
Algorithms
Round Robin — predictable, but doesn't handle different load sizes
servers = ["app-01", "app-02", "app-03"]
next_server = current_index % len(servers)
current_index += 1
Least Connections — better for variable request durations
def least_connections(servers: list) -> str:
return min(servers, key=lambda s: s.active_connections)
IP Hash — session persistence without cookies
def ip_hash(client_ip: str, servers: list) -> str:
hash_val = int(hashlib.md5(client_ip.encode()).hexdigest(), 16)
return servers[hash_val % len(servers)]
Health Checks: The Bare Minimum
┌──────────┐ /healthz ┌──────────┐
│ LB │───────────────▶│ App-01 │──▶ Returns 200
│ │ ├──────────┤
│ │───────────────▶│ App-02 │──▶ Returns 500 (removed from pool)
│ │ ├──────────┤
│ │───────────────▶│ App-03 │──▶ Returns 200
└──────────┘ └──────────┘
/healthz endpoint
@app.get("/healthz")
async def health_check():
Check critical dependencies
db_ok = await check_database()
cache_ok = await check_redis()
if db_ok and cache_ok:
return {"status": "ok"}
return {"status": "degraded"}, 503
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\---
8\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\. API Gateway Patterns
An API gateway sits between clients and your services, handling cross-cutting concerns.
┌─────────────────┐
│ API Gateway │
│ ┌─────────────┐ │
Client ──────────┼─▶ Auth │ │
│ └─────────────┘ │
│ ┌─────────────┐ │
│─▶ Rate Limit │ │──▶ Service A
│ └─────────────┘ │
│ ┌─────────────┐ │──▶ Service B
│─▶ Routing │ │
│ └─────────────┘ │──▶ Service C
│ ┌─────────────┐ │
│─▶ Logging │ │
│ └─────────────┘ │
└─────────────────┘
What the Gateway Handles
Before gateway — each service handles auth
@app.route("/api/orders")
class OrdersResource:
def get(self):
token = request.headers["Authorization"]
user = verify_token(token) # Duplicated in EVERY service
After gateway — auth is centralized
Service code is simpler:
@app.route("/api/orders")
class OrdersResource:
def get(self):
user = request.environ["X-Authenticated-User"] # Set by gateway
return get_orders(user["id"])
Gateway vs Service Mesh
| Concern | API Gateway | Service Mesh (e.g., Istio) | |---|---|---| | Client-facing | Yes (edge) | No (internal) | | Auth | Token verification, API keys | mTLS between services | | Rate limiting | Per-client, per-endpoint | Per-service | | Routing | URL-based | Traffic splitting, canary | | Location | Edge proxy | Sidecar per pod |
Recommendation : Start with an API gateway. Add a service mesh only when you have dozens of services and need advanced traffic management.
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\---
9\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\. Circuit Breaker and Resilience Patterns
The Circuit Breaker Pattern
class CircuitBreaker:
STATES = ["CLOSED", "OPEN", "HALF_OPEN"]
def init(self, failure_threshold=5, recovery_timeout=30):
self.failure_count = 0
self.failure_threshold = failure_threshold
self.recovery_timeout = recovery_timeout # seconds
self.state = "CLOSED"
self.last_failure_time = None
async def call(self, func, fallback=None):
if self.state == "OPEN":
if time.time() - self.last_failure_time > self.recovery_timeout:
self.state = "HALF_OPEN"
else:
return await fallback() if fallback else None
try:
result = await func()
if self.state == "HALF_OPEN":
self.state = "CLOSED"
self.failure_count = 0
return result
except Exception as e:
self.failure_count += 1
self.last_failure_time = time.time()
if self.failure_count >= self.failure_threshold:
self.state = "OPEN"
return await fallback() if fallback else None
Usage
cb = CircuitBreaker(failure_threshold=3, recovery_timeout=60)
async def get_recommendations(user_id: str):
return await cb.call(
func=lambda: recommendations_service.fetch(user_id),
fallback=lambda: {"recommendations": [], "source": "fallback"}
)
Other Resilience Patterns
| Pattern | What It Does | |---|---| | Retry with backoff | Exponential backoff + jitter to avoid thundering herd | | Timeout | Hard timeout per request (e.g., 5s) to prevent cascading | | Bulkhead | Isolate resources — limit connections per service | | Rate limiting | Token bucket or leaky bucket per client | | Dead letter queue | Failed messages go to a DLQ for manual inspection |
Retry with exponential backoff and jitter
async def retry_with_backoff(func, max_retries=3):
for attempt in range(max_retries):
try:
return await func()
except (ConnectionError, TimeoutError) as e:
if attempt == max_retries - 1:
raise
sleep_time = (2 ** attempt) + random.random() # exp + jitter
await asyncio.sleep(sleep_time)
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\---
10\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\. Real Example: Design a URL Shortener
Let's design bit.ly/tinyurl step by step.
Requirements
-
Generate a short, unique code for any URL
-
Redirect to the original URL when the short code is accessed
-
Track click analytics (count, referrer, timestamp)
-
Handle 10M URLs, 100M redirects/day
Step 1: URL Encoding
BASE62 = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
def encode_base62(num: int) -> str:
if num == 0:
return BASE62[0]
result = []
while num > 0:
result.append(BASE62[num % 62])
num //= 62
return ''.join(reversed(result))
def decode_base62(code: str) -> int:
result = 0
for char in code:
result = result * 62 + BASE62.index(char)
return result
Example: 7 chars of base62 = 62^7 ≈ 3.5 trillion unique URLs
encode_base62(123456789) # "8m0Kx"
Step 2: Architecture
┌────────────┐
│ Analytics │
│ (Kafka → │
│ ClickHouse)│
└─────────────┘
▲
│ (async)
┌──────────┐ POST /shorten ┌──────────────────────────┐
│ Client │────────────────────▶│ API Gateway │
│ │ │ ┌────────────────────┐ │
│ │ GET /abc123 │ │ Write Service │──┼──▶ PostgreSQL (URLs)
│ │────────────────────▶│ │ (generate code) │ │
│ │ │ └────────────────────┘ │
│ │ 301 Redirect │ ┌────────────────────┐ │
│ │◀────────────────────│ │ Read Service │ │
│ │ │ │ (resolve + cache) │──┼──▶ Redis (cache)
│ │ │ └────────────────────┘ │
│ │ │ ┌────────────────────┐ │
│ │ │ │ Click Logger │──┼──▶ Kafka
│ │ │ └────────────────────┘ │
└──────────┘ └──────────────────────────┘
Step 3: Data Model
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\-- PostgreSQL
CREATE TABLE urls (
id BIGSERIAL PRIMARY KEY,
short_code VARCHAR(10) UNIQUE NOT NULL,
original_url TEXT NOT NULL,
user_id UUID, -- nullable for anonymous users
created_at TIMESTAMP DEFAULT NOW(),
expires_at TIMESTAMP -- nullable
);
CREATE INDEX idx_short_code ON urls(short_code);
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\-- Redis cache
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\-- Key: "url:abc123" → Value: "https://example.com/long-url"
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\-- TTL: 24 hours
Step 4: Write Path
@app.post("/shorten")
async def shorten_url(url: str, user_id: str = None):
1\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\. Check if URL already shortened (optimization)
existing = await db.query(
"SELECT short_code FROM urls WHERE original_url = :url AND user_id = :uid",
{"url": url, "uid": user_id}
)
if existing:
return {"short_url": f"https://short.domain/{existing['short_code']}"}
2\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\. Generate unique code
short_code = await generate_unique_code()
3\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\. Store in DB
await db.execute(
"INSERT INTO urls (short_code, original_url, user_id) VALUES (:c, :u, :uid)",
{"c": short_code, "u": url, "uid": user_id}
)
4\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\. Warm the cache
await redis.setex(f"url:{short_code}", 86400, url)
return {"short_url": f"https://short.domain/{short_code}"}
async def generate_unique_code() -> str:
for _ in range(3): # Retry on collision
code = encode_base62(random.randint(0, 62**7 - 1))
exists = await db.query(
"SELECT 1 FROM urls WHERE short_code = :c", {"c": code}
)
if not exists:
return code
raise Exception("Collision rate too high — increase code length")
Step 5: Read Path (The Hot Path — Handles 100M req/day)
@app.get("/{short_code}")
async def redirect(short_code: str, request: Request):
1\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\. Try cache (99% hit rate with 24h TTL)
original_url = await redis.get(f"url:{short_code}")
if not original_url:
2\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\. Cache miss — hit DB
row = await db.query(
"SELECT original_url FROM urls WHERE short_code = :c",
{"c": short_code}
)
if not row:
raise HTTPException(status_code=404)
original_url = row["original_url"]
3\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\. Populate cache with TTL
await redis.setex(f"url:{short_code}", 86400, original_url)
4\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\. Log click asynchronously (don't block the redirect)
click_event = {
"short_code": short_code,
"ip": request.client.host,
"user_agent": request.headers.get("user-agent"),
"referer": request.headers.get("referer"),
"timestamp": int(time.time()),
}
Fire and forget — queue to Kafka
await click_producer.send("url_clicks", click_event)
5\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\. Redirect (301 for permanent, 302 for analytics)
return RedirectResponse(url=original_url, status_code=301)
Step 6: Scale Considerations
-
Read replicas for URL resolution (read-heavy: 10:1 read-to-write ratio)
-
Redis cluster for cache (with consistent hashing)
-
Kafka partitions by short_code for ordered click logs
-
Batch write click analytics to ClickHouse every 30 seconds
-
CDN for the redirect page itself (not the API — API calls are cheap)
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\---
11\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\. Async Processing Patterns
The Problem: Synchronous Chains
Client ──▶ Service A ──▶ Service B ──▶ Service C ──▶ Response
500ms 800ms 200ms = 1.5s total
The client waits 1.5 seconds for something that doesn't need a response.
Solution: Decouple with Async
Client ──▶ Service A ──▶ Response (immediate: "Accepted")
│
▼
Queue (Kafka/SQS)
│
┌──────┴──────┐
▼ ▼
Service B Service C
(email) (generate PDF)
Pattern 1: Fire and Forget
@app.post("/api/send-email")
async def send_email(request: EmailRequest):
Validate request
if not request.valid:
raise HTTPException(400)
Queue the work — don't wait
await email_queue.send({
"to": request.to,
"template": request.template,
"data": request.data,
})
Return immediately
return {"status": "queued", "message_id": str(uuid.uuid4())}
Pattern 2: Polling with Status
@app.post("/api/report/generate")
async def generate_report(params: ReportParams):
report_id = str(uuid.uuid4())
await report_queue.send({"report_id": report_id, "params": params})
return {"report_id": report_id, "status_url": f"/api/report/{report_id}/status"}
@app.get("/api/report/{report_id}/status")
async def check_status(report_id: str):
status = await redis.get(f"report:{report_id}:status")
if status == "ready":
return {"status": "ready", "url": f"/api/report/{report_id}/download"}
return {"status": "processing"}
Pattern 3: Webhook Callback
Instead of polling, have the worker call a URL when done:
async def process_report(report_id: str, params: dict, callback_url: str):
... generate report ...
await save_report(report_id, result)
Notify caller
if callback_url:
await httpx.post(callback_url, json={
"report_id": report_id,
"status": "completed",
"download_url": f"/api/report/{report_id}/download",
})
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\---
12\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\. Common Anti-Patterns
1\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\. The Distributed Monolith
You split into microservices but deploy them together and fail to maintain boundaries. Every service calls every other service directly. Schema changes ripple across the system.
Signs : A "simple" feature touches 5+ services. You need to coordinate deploys across teams. Services share a database — or god forbid, tables.
Fix : Enforce bounded contexts. Each service owns its data. Communication is via APIs or events, not shared databases.
2\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\. Over-Engineering from Day One
"Let's use Kafka, Cassandra, Kubernetes, and event sourcing" — for a blog with 10 visitors/day.
Fix : Start with the simplest thing that works. A monolith with PostgreSQL and Redis will handle 99% of applications. Extract services when there's a proven need.
3\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\. Synchronous Coupling via HTTP
Service A ──HTTP──▶ Service B ──HTTP──▶ Service C ──HTTP──▶ Service D
If one service is slow, the whole chain slows. Latency adds up. Failures cascade.
Fix : Use async communication for non-critical paths. Use circuit breakers for critical sync calls. Prefer eventual consistency over synchronous coordination.
4\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\. The Shared Database
Two services reading/writing the same database table. Schema changes require coordination. One service can deadlock the other.
Fix : Each service owns its data. Share via APIs or events, not databases.
5\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\. Ignoring Caching
Every request hits the database. Database CPU is 90%. Response times are 200ms for data that changes hourly.
Fix : Add Redis. Cache the most frequently accessed data. Even a 60-second cache TTL reduces DB load by 95% for read-heavy workloads.
6\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\. The N+1 Query Problem
Anti-pattern: N+1 queries
def get_orders_with_items(user_id: str):
orders = db.query("SELECT * FROM orders WHERE user_id = :uid", {"uid": user_id})
for order in orders:
One query PER order — terrible!
order["items"] = db.query(
"SELECT * FROM order_items WHERE order_id = :oid",
{"oid": order["id"]}
)
return orders
Fix: single query with JOIN
def get_orders_with_items_fixed(user_id: str):
return db.query("""
SELECT o.id, o.total, oi.product_id, oi.quantity
FROM orders o
LEFT JOIN order_items oi ON oi.order_id = o.id
WHERE o.user_id = :uid
""", {"uid": user_id})
7\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\. No Monitoring / No Observability
"Everything looks fine" — until users complain that the site is slow and you have no idea why.
Baseline monitoring : Request latency (p50, p95, p99), error rate, throughput, CPU/memory per service. Structured logging with correlation IDs. Distributed tracing for async flows.
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\---
Summary: Key Decisions for 2026
| Decision | Default Choice | Upgrade When | |---|---|---| | Architecture | Modular monolith | Team >15 or clear independent scale need | | Database | PostgreSQL | Read replicas at 10k reads/s, sharding at 100k | | Cache | Redis (cache aside) | Write-behind for high-throughput writes | | Queue | SQS (serverless) → RabbitMQ (control) → Kafka (streaming) | Scale-dependent | | Async | Fire and forget for non-critical | Polling → Webhooks as needs grow | | API Gateway | NGINX / Traefik | Envoy / Kong for advanced routing | | Resilience | Circuit breaker + timeout | Bulkhead + rate limiting at scale |
The best system design is the one that solves today's problem without creating tomorrow's nightmare. Start simple, measure everything, extract with surgical precision, and never optimize for a scale you haven't reached.
Enjoy this article? Share your thoughts, questions, or experiences in the comments below — your insights help other readers too.
Join the discussion ↓