Databases in Production

Scaling Databases

18 min Lesson 8 of 30

Scaling Databases

At some point every database hits a wall. Queries slow down, CPU pegs at 100%, disk I/O saturates, or the primary simply cannot absorb another thousand writes per second. The correct response depends entirely on which wall you hit. Throwing money at the wrong solution is one of the most common and expensive mistakes in production database operations — and scaling a database the wrong way can create years of technical debt that is far harder to unwind than the original performance problem.

This lesson walks through the three tiers of the scaling playbook in the order you should actually reach for them: vertical scaling first, read scaling second, sharding only when you genuinely have no other choice.

Tier 1 — Vertical Scaling: The Fastest Lever You Have

Vertical scaling means moving to a larger host: more CPU cores, more RAM, faster NVMe disks. It is the right first move because it requires zero application changes, zero data migration, and often resolves the problem immediately. Its limits are real but frequently reached later than engineers expect.

What vertical scaling actually buys you:

RAM: The single most impactful resource for most OLTP workloads. PostgreSQL's shared_buffers and MySQL's innodb_buffer_pool_size cache hot data pages. When your working set fits in RAM, disk reads drop to near zero. A database on a 128 GB host often performs 5–10× better than on a 32 GB host running the same queries.
CPU: Matters for complex analytical queries, connection handling, and sort/hash operations. Doubling CPU cores rarely doubles throughput for write-heavy OLTP — the bottleneck is usually I/O or lock contention, not compute.
Disk: Switching from spinning HDD to NVMe SSD is often a 10–50× improvement in random read/write latency. If your host is still on EBS gp2 (AWS), upgrading to gp3 or io2 with provisioned IOPS is a no-downtime change that frequently eliminates I/O wait entirely.

Check your buffer hit rate before anything else. If PostgreSQL shows a buffer hit rate below 95% or MySQL shows Innodb_buffer_pool_reads climbing faster than Innodb_buffer_pool_read_requests, you are disk-bound, and more RAM will fix it before any architectural change. Run this on PostgreSQL to see it instantly:

-- PostgreSQL: buffer cache hit rate per database
SELECT
  datname,
  blks_hit,
  blks_read,
  round(blks_hit::numeric / nullif(blks_hit + blks_read, 0) * 100, 2) AS hit_rate_pct
FROM pg_stat_database
WHERE datname NOT IN ('template0','template1')
ORDER BY hit_rate_pct ASC;

-- MySQL/MariaDB: InnoDB buffer pool hit rate
SHOW GLOBAL STATUS LIKE 'Innodb_buffer_pool_read%';
-- hit rate ≈ 1 - (Innodb_buffer_pool_reads / Innodb_buffer_pool_read_requests)
-- anything below 0.99 (99%) signals a buffer pool that is too small

The real ceiling of vertical scaling is not the instance size — it is the cost curve and single-node SPOF risk. A 192-core, 24 TB RAM machine exists in major clouds, but costs $40k+/month. More practically, a single node means a single failure domain. The combination of cost ceiling and HA requirements is what genuinely forces the move to the next tier.

Tier 2 — Read Scaling: Adding Read Replicas

If your workload is read-heavy (a ratio of 5:1 or higher reads to writes is typical for web applications), and your primary is CPU-bound despite adequate RAM and fast disks, read replicas are the correct next step. The primary handles all writes; replicas serve reads using streaming replication (PostgreSQL) or binary log replication (MySQL). A load balancer or smart connection string routes read traffic to the replica pool.

Production-grade patterns differ from the toy examples in tutorials:

Replica lag is a real hazard. Under heavy write load, replicas can fall behind the primary by seconds or more. Reading stale data after a write is a class of bug — called a read-your-own-writes inconsistency — that is extremely hard to debug in production. Route session-critical reads (profile settings, payment status) to the primary, and only route analytics, dashboards, and search to replicas.
Use separate connection pools for each endpoint. PgBouncer or ProxySQL should pool connections to the primary separately from connections to the replica pool. Never mix them in a single pool.
Monitor replica lag as a first-class SLO. Alert when lag exceeds your tolerance (typically 1–5 seconds for web apps, milliseconds for financial data). In PostgreSQL, this is pg_stat_replication; in MySQL it is the Seconds_Behind_Source field in SHOW REPLICA STATUS.
Read replicas do not reduce primary write load. If your bottleneck is write throughput, replicas solve nothing — you are at the sharding decision point.

Read replica routing: the proxy directs writes to the primary and distributes reads across replicas. Replication lag must be monitored as a first-class signal.

Tier 3 — Sharding: When You Actually Need It

Sharding — horizontally partitioning data across multiple independent database nodes, each owning a distinct subset of rows — is the nuclear option. It solves write saturation and storage limits that no single host can address. It also introduces a class of complexity that has ended engineering careers and caused production outages that took days to recover from. Before you shard, you need to be honest with yourself about whether you have exhausted every alternative.

Signs you have genuinely reached the sharding threshold:

Write throughput that exceeds the maximum of the largest available instance (typically >50k writes/sec for MySQL, >100k TPS for PostgreSQL on NVMe)
Dataset larger than the maximum storage of a single node (hundreds of TB)
Legal or compliance requirements to partition data by geography or tenant

Alternatives to check before sharding: table partitioning (built into PostgreSQL and MySQL, zero application changes, handles datasets up to tens of TB elegantly), CQRS with a separate read model (Elasticsearch, read-only Redshift), moving time-series data to a purpose-built store (TimescaleDB, InfluxDB), and archiving cold rows to object storage (AWS S3 + Athena).

Sharding Strategies

Range sharding assigns rows to shards by a contiguous range of the shard key (e.g., user IDs 1–10M on shard 1, 10M–20M on shard 2). Simple to reason about; terrible for hotspots — if your newest users are most active, all traffic hits the highest-range shard.

Hash sharding applies a hash function to the shard key and assigns the row to hash(key) % N shards. Even distribution; resharding when you add nodes requires moving data (consistent hashing reduces this but does not eliminate it).

Directory-based sharding maintains an explicit lookup table mapping each key to a shard. Maximum flexibility; the lookup table becomes a single point of failure and a write bottleneck if not carefully managed.

The cross-shard query problem is permanent. Once you shard, any query that spans multiple shards (JOINs, aggregations, transactions) requires scatter-gather logic at the application layer or a middleware layer (Vitess, Citus). There is no SQL engine that transparently handles this without tradeoffs. Operations that take one SQL statement on a single node may require dozens of round trips across shards. You cannot add sharding later and keep your query patterns unchanged — the application must be redesigned for it from day one.

Managed Sharding: What the Platform Does For You

At big-tech scale, sharding is usually handled by a middleware tier rather than in application code directly. The dominant options in 2025:

Vitess — MySQL sharding layer used by YouTube, Slack, PlanetScale. Handles resharding, cross-shard queries, and connection pooling transparently. Steep operational complexity.
Citus (PostgreSQL) — native distributed PostgreSQL extension. Distributes tables across worker nodes; transparent to most SQL. Strong fit for multi-tenant SaaS and analytics.
CockroachDB / YugabyteDB / Spanner — distributed SQL engines that shard internally. You write standard SQL; the engine handles sharding, replication, and failover. Trade: higher latency (distributed consensus) and higher cost than a single-node Postgres.

-- Citus: distribute an existing table by tenant_id (run on coordinator node)
-- Step 1: add Citus extension
CREATE EXTENSION citus;

-- Step 2: distribute the table — all queries with a tenant_id filter stay on one shard
SELECT create_distributed_table('orders', 'tenant_id');

-- Step 3: co-locate a related table so JOINs between them are local
SELECT create_distributed_table('order_items', 'tenant_id', colocate_with => 'orders');

-- Check shard placement
SELECT shardid, nodename, nodeport
FROM pg_dist_shard_placement
JOIN pg_dist_shard USING (shardid)
WHERE logicalrelid = 'orders'::regclass
LIMIT 10;

The Decision Framework: Which Tier, Right Now?

Engineers routinely jump to sharding when the correct answer was more RAM or a read replica. Use this checklist before escalating to the next tier:

Is the buffer hit rate below 95%? → Add RAM first.
Is the primary CPU-bound on reads, not writes? → Add read replicas.
Is the write throughput saturated? → Check table partitioning, CQRS, and archiving before sharding.
Have you profiled and eliminated slow queries? A missing index fixed in five minutes has resolved incidents that were being diagnosed as capacity problems.
Only after all of the above → Evaluate sharding with Vitess, Citus, or a distributed SQL engine.

Big-tech reality: Most companies that think they need sharding actually need better indexing, a larger instance, or read replicas. Sharding is the right answer at a scale most products will never reach. The companies that have sharded production databases at massive scale (Google, Meta, Stripe) have teams of dedicated infrastructure engineers to manage the ongoing operational cost. Size your ambition to your team.