Capacity Reviews & Forecasting Practice
Capacity Reviews & Forecasting Practice
Autoscaling handles the minute-to-minute elasticity of a live system, but it cannot tell you whether your infrastructure will survive next quarter's growth, a product launch that triples your user base, or an expansion into a new region. That responsibility belongs to the capacity review — a structured engineering process that connects business intent to infrastructure commitments. This lesson covers how senior engineers at top-tier companies run launch reviews, build growth models, and reason about multi-region capacity in a way that holds up under cross-functional scrutiny.
Launch Reviews: Gatekeeping Production Capacity
A launch review (sometimes called a production readiness review, or PRR) is a pre-launch checkpoint where the team owning a new feature or service demonstrates that it will not cause an outage when real traffic hits it. At companies like Google, Meta, and Amazon, completing a launch review is a hard prerequisite for a significant traffic ramp. The review is not bureaucracy — it surfaces capacity blind spots before they become incidents.
A well-structured launch review covers four areas:
- Traffic shape and peak estimates. What is the projected p50 and p99 request rate at launch? Is traffic bursty (a flash sale, a "top of hour" cron fan-out) or smooth? How does traffic degrade gracefully — is there a CDN layer, a queue, or does load land directly on the origin?
- Resource sizing verification. Run load tests at 150% of peak forecast, confirm CPU and memory headroom on both the service and its dependencies (databases, caches, message brokers). Verify that HPA will fire and new pods will land before latency breaches the SLO.
- Dependency capacity contracts. Every upstream and downstream service must confirm they have headroom to absorb the launch. A single downstream that has no runway will cascade into the new service regardless of how well-sized it is.
- Rollback and load-shed plan. Document the exact commands that revert the rollout, kill the feature flag, or activate load shedding if the launch goes sideways. This should be rehearsed, not written for the first time during an incident.
Load-test automation is the foundation. The k6 script below models a realistic launch ramp — not a flat wall of load, but a staged increase that mirrors how a phased rollout or a marketing campaign drives user acquisition:
Integrate this script into your CI pipeline so every feature branch can prove it meets SLOs before the review meeting even happens. The launch review then becomes a presentation of evidence, not a discovery exercise.
Growth Modeling: Translating Business Plans into Resource Numbers
Growth modeling converts a product roadmap and a business forecast into a set of resource projections that engineering can act on. The output is not a single number — it is a range with confidence intervals, updated on a regular cadence (typically monthly).
The simplest effective model uses three inputs:
- Current baseline. Measured resource consumption per unit of business activity (requests per active user per day, database row writes per order, GB egress per video view). Extract this from your observability stack — Prometheus metrics correlated with business analytics events.
- Growth rate. User growth, transaction volume growth, or data volume growth — whichever drives your dominant cost driver. Use the product team's committed forecast for planning, and a P90 upside scenario for headroom.
- Efficiency improvement. Every quarter, caching improvements, query optimizations, and protocol upgrades reduce the resource cost per unit. Model a conservative 10–15% per-year efficiency improvement so you are not over-provisioning against a cost-per-unit that will shrink.
The following script pulls the last 90 days of Prometheus data and fits a linear trend to help anchor the model:
Regional Capacity Planning
Expanding into a new region — or maintaining N+1 regional redundancy — requires a separate capacity exercise because regional traffic is never a simple fraction of global traffic. Regional capacity planning accounts for three factors that global models miss:
- Latency-sensitive affinity. Users do not distribute uniformly across regions. A new APAC region may capture 25% of global signups but generate 40% of API calls because the lower latency drives higher engagement. Measure existing latency buckets by geography to build region-specific request-rate multipliers.
- Data residency requirements. GDPR, data sovereignty laws, and enterprise customer contracts often mandate that specific data stay within a region. This forces local database primaries and local object storage, which have a higher fixed cost floor than a pure read-replica deployment.
- Regional failure isolation budget. If you are targeting N+1 redundancy, each region must be sized to absorb 100% of traffic from the failed region during a failover. Many teams under-provision the standby region with "we will scale it up if we need it" — a plan that fails in practice when failover coincides with a traffic spike.
The critical sizing rule for N+1 redundancy: run each region at no more than 60–70% utilization during normal operation. This preserves enough headroom to absorb a full failover plus the additional autoscaling lag while a runaway traffic spike and a regional failure coincide — the worst-case scenario your capacity plan must survive.
Running the Quarterly Capacity Review Meeting
A capacity review meeting is most effective when it follows a consistent agenda, preventing it from becoming a free-form discussion. A proven structure:
- Current state (10 min). Show a 90-day utilization trend for each tier: CPU, memory, disk I/O, network egress, database connections. Call out any metric that crossed 70% of capacity in the last quarter.
- Forecast vs. actuals (10 min). Compare the projections from the previous quarterly review against reality. A model that consistently over-predicts wastes money; one that under-predicts causes incidents. Tune the model's growth multipliers based on variance.
- Next-quarter projections (15 min). Walk through the growth model for the next 90 days, including upcoming launches, marketing campaigns, and seasonality. Identify the resource that will hit 80% utilization first — this is the critical path for the quarter.
- Action items (5 min). Every at-risk resource needs an owner and a target resolution date: vertical scaling, horizontal scaling approval, code optimization, or a quota increase request with the cloud provider.
A runbook that captures the quarterly review cadence as code makes the process reproducible. The following shell snippet exports the key Prometheus metrics into a CSV that serves as the starting point for the review deck:
Capacity planning closes the loop between the reactive elasticity of autoscaling and the proactive resource governance that keeps platforms stable as businesses grow. The engineers who master it are the ones who prevent the midnight incidents — not the ones who respond to them.