Performance & Load Testing

Load Testing Concepts

18 min Lesson 2 of 28

Load Testing Concepts

Before you write a single k6 script or spin up a JMeter cluster, you need a shared vocabulary. Teams that conflate load test with stress test routinely draw wrong conclusions from their data — they declare a service healthy when it is one traffic spike away from paging the on-call engineer. This lesson establishes precise definitions and the mental model you carry into every test design decision.

The Four Test Profiles

Each profile answers a different engineering question. Using the wrong one is as costly as skipping testing entirely, because you get confidence in the wrong property.

Load Test

A load test validates that the system meets its SLO targets at the expected production traffic level — typically defined as peak or P95 observed throughput. The question is: does the system perform correctly under normal operating load? Duration is usually 10–30 minutes; long enough for JVM warm-up, connection-pool saturation, and GC cycles to stabilise, but not so long that you are measuring drift.

At Google and Meta scale, "expected load" is derived from capacity plans and traffic forecasts, not guesses. You pull the P95 QPS from your Prometheus dashboards (you already know how) and replicate that rate with realistic request distributions.

Stress Test

A stress test pushes traffic beyond the expected peak to find the breaking point — the load level at which latency degrades past SLO, error rates rise, or the process crashes. The question is: what is the system's capacity ceiling, and how does it fail?

Equally important is what happens after the breaking point: does the system self-recover when load drops, or does it need a manual restart? A good stress test ramps traffic up in steps (e.g. 100% → 150% → 200% of expected load), pauses at each step for a steady-state window, and then ramps back down.

Soak Test

A soak test (also called an endurance test) runs at normal or slightly elevated load for an extended period — hours to days. The question is: does performance degrade over time? This is the test that catches memory leaks, connection-pool exhaustion, log-file disk fill, cache fragmentation, and database bloat that only manifest after thousands of requests.

Soak tests are the most commonly skipped profile in CI pipelines, yet they catch the class of bugs most likely to cause 3 AM incidents in week two of a launch. If you can only run one long test before a major release, run a soak test at 80% of expected peak for 4–8 hours.

Spike Test

A spike test subjects the system to a sudden, sharp increase in load — from idle or baseline to multiples of peak — and then drops it just as suddenly. The question is: how does the system respond to abrupt traffic bursts? This profile maps directly to real-world scenarios: a product going viral, a flash sale, a celebrity tweet, a cron job launching thousands of parallel workers at midnight.

A well-designed spike test exposes whether your autoscaling reacts fast enough to keep latency within SLO, whether your connection pools queue or reject under the burst, and whether your circuit breakers fire before downstream services cascade.

The four canonical load-test profiles and their characteristic traffic shapes.

Open vs Closed Workload Models

This is one of the most misunderstood concepts in load testing, and getting it wrong produces results that look good in a report but fail in production.

Closed Workload Model

In a closed model, the number of concurrent virtual users (VUs) is fixed. Each VU finishes a request, optionally waits (think-time), then immediately starts the next. Throughput is therefore bounded by the number of VUs divided by the mean response time. As the system slows down, throughput automatically drops — the VUs are blocked waiting for slow responses.

This matches a connection-pool model or a thread-per-request server: the pool has N threads; once all N are occupied, new requests queue. Most load testing tools default to the closed model (JMeter threads, Gatling users, k6 VUs in their default loop).

The critical property of a closed model: when your system slows down, arrival rate slows down too. This means a closed model can mask overload. The system appears to handle 500 VUs fine — but only because it is processing 50 RPS when it should be doing 500 RPS. In production, real clients keep arriving at their own pace regardless of how slow your service is.

Open Workload Model

In an open model, requests arrive at a fixed rate (e.g. 1000 RPS), independent of how long the system takes to serve them. If latency increases, requests queue up. The queue grows until it overflows (client timeout or memory exhaustion) — which is exactly what happens with real HTTP traffic, event-stream consumers, or message-queue producers.

Open models are harder to implement because the load generator must sustain a target arrival rate even when responses are slow, which demands much more load-generator capacity. k6's constant-arrival-rate executor implements an open model. JMeter does it with the Throughput Shaping Timer plugin.

Production rule of thumb: model external (internet-facing) traffic with an open workload, and model internal service-to-service calls that are gated by a thread pool with a closed workload. A checkout service handling browser clients is open; a worker pool consuming from Kafka is closed.

Closed model — throughput drops with latency; Open model — arrival rate is fixed and queue grows independently.

Configuring k6 for Each Profile

Understanding the theory is one thing; knowing exactly which k6 executor to reach for is what you need on the job. Below are production-ready starters for each profile. The scenarios API (introduced in k6 v0.27) gives you fine-grained control over executor type, ramp shape, and per-scenario options.

// k6 scenario config — four profiles (options block only)
// File: k6-profiles.js
export const options = {
  scenarios: {

    // --- 1. LOAD TEST (closed model, constant VUs) ---
    load_test: {
      executor: 'ramping-vus',
      startVUs: 0,
      stages: [
        { duration: '2m', target: 200 },   // ramp-up
        { duration: '15m', target: 200 },  // steady state at expected peak
        { duration: '2m', target: 0 },     // ramp-down
      ],
      gracefulRampDown: '30s',
    },

    // --- 2. STRESS TEST (open model, stepped arrival rate) ---
    stress_test: {
      executor: 'ramping-arrival-rate',
      startRate: 100,
      timeUnit: '1s',
      preAllocatedVUs: 500,
      maxVUs: 2000,
      stages: [
        { duration: '5m', target: 100 },   // 100 RPS — baseline
        { duration: '5m', target: 250 },   // 250 RPS
        { duration: '5m', target: 500 },   // 500 RPS — expected peak
        { duration: '5m', target: 750 },   // 150% — stress
        { duration: '5m', target: 1000 },  // 200% — find the break
        { duration: '5m', target: 100 },   // recovery
      ],
    },

    // --- 3. SOAK TEST (closed model, sustained load) ---
    soak_test: {
      executor: 'constant-vus',
      vus: 150,           // ~80% of expected peak concurrency
      duration: '4h',
    },

    // --- 4. SPIKE TEST (open model, burst arrival) ---
    spike_test: {
      executor: 'ramping-arrival-rate',
      startRate: 50,
      timeUnit: '1s',
      preAllocatedVUs: 1000,
      maxVUs: 3000,
      stages: [
        { duration: '1m', target: 50 },    // idle baseline
        { duration: '30s', target: 2000 }, // spike: 40x burst in 30s
        { duration: '2m', target: 2000 },  // hold spike
        { duration: '30s', target: 50 },   // drop back
        { duration: '5m', target: 50 },    // observe recovery
      ],
    },

  },
  thresholds: {
    http_req_duration: ['p(95)<500', 'p(99)<1500'],
    http_req_failed:   ['rate<0.01'],
  },
};

Never run all four scenarios simultaneously — you will saturate the load generator before you saturate the target. Keep one scenario active per run, and use a separate options export per file, or a --config flag to select the right block. In CI, run the load test on every PR; gate stress and soak tests to nightly or pre-release pipelines.

Choosing the Right Profile for the Right Question

Experienced engineers select the profile from the engineering question, not from habit:

SLO validation before a deploy: load test at expected P95 QPS.
Capacity planning for 3x growth: stress test from 1x to 4x.
Memory leak investigation after week-two incident: soak test for 6–8 hours.
Autoscaler and circuit-breaker validation: spike test with 10–20x burst.
Chaos experiment under real traffic: run a soak test as the backdrop while injecting faults.

In the next lesson you will put this into practice with full k6 scripts, thresholds tied to real SLOs, and integration into a CI pipeline that gates deployments on performance regressions.