Load Testing Concepts
Load Testing Concepts
Before you write a single k6 script or spin up a JMeter cluster, you need a shared vocabulary. Teams that conflate load test with stress test routinely draw wrong conclusions from their data — they declare a service healthy when it is one traffic spike away from paging the on-call engineer. This lesson establishes precise definitions and the mental model you carry into every test design decision.
The Four Test Profiles
Each profile answers a different engineering question. Using the wrong one is as costly as skipping testing entirely, because you get confidence in the wrong property.
Load Test
A load test validates that the system meets its SLO targets at the expected production traffic level — typically defined as peak or P95 observed throughput. The question is: does the system perform correctly under normal operating load? Duration is usually 10–30 minutes; long enough for JVM warm-up, connection-pool saturation, and GC cycles to stabilise, but not so long that you are measuring drift.
At Google and Meta scale, "expected load" is derived from capacity plans and traffic forecasts, not guesses. You pull the P95 QPS from your Prometheus dashboards (you already know how) and replicate that rate with realistic request distributions.
Stress Test
A stress test pushes traffic beyond the expected peak to find the breaking point — the load level at which latency degrades past SLO, error rates rise, or the process crashes. The question is: what is the system's capacity ceiling, and how does it fail?
Equally important is what happens after the breaking point: does the system self-recover when load drops, or does it need a manual restart? A good stress test ramps traffic up in steps (e.g. 100% → 150% → 200% of expected load), pauses at each step for a steady-state window, and then ramps back down.
Soak Test
A soak test (also called an endurance test) runs at normal or slightly elevated load for an extended period — hours to days. The question is: does performance degrade over time? This is the test that catches memory leaks, connection-pool exhaustion, log-file disk fill, cache fragmentation, and database bloat that only manifest after thousands of requests.
Spike Test
A spike test subjects the system to a sudden, sharp increase in load — from idle or baseline to multiples of peak — and then drops it just as suddenly. The question is: how does the system respond to abrupt traffic bursts? This profile maps directly to real-world scenarios: a product going viral, a flash sale, a celebrity tweet, a cron job launching thousands of parallel workers at midnight.
A well-designed spike test exposes whether your autoscaling reacts fast enough to keep latency within SLO, whether your connection pools queue or reject under the burst, and whether your circuit breakers fire before downstream services cascade.
Open vs Closed Workload Models
This is one of the most misunderstood concepts in load testing, and getting it wrong produces results that look good in a report but fail in production.
Closed Workload Model
In a closed model, the number of concurrent virtual users (VUs) is fixed. Each VU finishes a request, optionally waits (think-time), then immediately starts the next. Throughput is therefore bounded by the number of VUs divided by the mean response time. As the system slows down, throughput automatically drops — the VUs are blocked waiting for slow responses.
This matches a connection-pool model or a thread-per-request server: the pool has N threads; once all N are occupied, new requests queue. Most load testing tools default to the closed model (JMeter threads, Gatling users, k6 VUs in their default loop).
Open Workload Model
In an open model, requests arrive at a fixed rate (e.g. 1000 RPS), independent of how long the system takes to serve them. If latency increases, requests queue up. The queue grows until it overflows (client timeout or memory exhaustion) — which is exactly what happens with real HTTP traffic, event-stream consumers, or message-queue producers.
Open models are harder to implement because the load generator must sustain a target arrival rate even when responses are slow, which demands much more load-generator capacity. k6's constant-arrival-rate executor implements an open model. JMeter does it with the Throughput Shaping Timer plugin.
Configuring k6 for Each Profile
Understanding the theory is one thing; knowing exactly which k6 executor to reach for is what you need on the job. Below are production-ready starters for each profile. The scenarios API (introduced in k6 v0.27) gives you fine-grained control over executor type, ramp shape, and per-scenario options.
options export per file, or a --config flag to select the right block. In CI, run the load test on every PR; gate stress and soak tests to nightly or pre-release pipelines.
Choosing the Right Profile for the Right Question
Experienced engineers select the profile from the engineering question, not from habit:
- SLO validation before a deploy: load test at expected P95 QPS.
- Capacity planning for 3x growth: stress test from 1x to 4x.
- Memory leak investigation after week-two incident: soak test for 6–8 hours.
- Autoscaler and circuit-breaker validation: spike test with 10–20x burst.
- Chaos experiment under real traffic: run a soak test as the backdrop while injecting faults.
In the next lesson you will put this into practice with full k6 scripts, thresholds tied to real SLOs, and integration into a CI pipeline that gates deployments on performance regressions.