Lesson 2 established load testing theory — virtual users, ramp shapes, percentile math. This lesson moves to the tool you will spend the most time in: k6. Originally built by Load Impact, now a Grafana Labs project, k6 is the industry standard for developer-owned load testing. It is written in Go (so it can hold hundreds of thousands of VUs with modest RAM), scripted in JavaScript (ES2015+), and designed from the ground up to live inside a CI pipeline. At Grafana, Shopify, and many Tier-1 SRE teams, k6 scripts are versioned alongside service code — every PR gate includes a smoke test and every release candidate runs a full soak.
k6 is not a browser automation tool. It generates HTTP/WebSocket/gRPC traffic at the protocol level. It does not execute JavaScript in a browser. If you need real-browser load testing (for SPAs that do heavy client-side rendering) use k6 Browser (xk6-browser extension). For pure API and backend load testing, the default engine is what you want.
Script Structure: the Anatomy of a k6 Test
Every k6 script exports a default function that is the VU body — the code each virtual user executes in a loop. The script also has an init context (module-level code) that runs once per VU before the test starts, and optional lifecycle hooks: setup() (runs once before all VUs start) and teardown(data) (runs once after all VUs finish).
// checkout-flow.js — production-grade k6 script skeleton
import http from 'k6/http';
import { check, sleep } from 'k6';
import { Trend, Rate, Counter } from 'k6/metrics';
// --- Custom metrics (defined at init time, shared across VUs) ---
const checkoutLatency = new Trend('checkout_latency_ms', true); // true = high-resolution
const checkoutErrors = new Rate('checkout_error_rate');
const checkoutCount = new Counter('checkouts_attempted');
// --- Thresholds and stages (the test contract) ---
export const options = {
stages: [
{ duration: '2m', target: 50 }, // ramp up to 50 VUs
{ duration: '5m', target: 50 }, // hold steady load
{ duration: '2m', target: 200 }, // spike to 200 VUs
{ duration: '5m', target: 200 }, // hold spike
{ duration: '2m', target: 0 }, // ramp down
],
thresholds: {
http_req_duration: ['p(95)<500', 'p(99)<1500'], // SLO gate
checkout_error_rate: ['rate<0.01'], // <1% errors
http_req_failed: ['rate<0.005'], // k6 built-in failure rate
},
};
// setup() runs ONCE before VUs start; its return value is passed to default() and teardown()
export function setup() {
const res = http.post('https://api.example.com/auth/token', JSON.stringify({
client_id: 'load-test-bot',
client_secret: __ENV.API_SECRET, // inject secrets via env, never hardcode
}), { headers: { 'Content-Type': 'application/json' } });
check(res, { 'auth OK': (r) => r.status === 200 });
return { token: res.json('access_token') };
}
// default() is the VU loop — called repeatedly for each VU
export default function (data) {
const headers = {
Authorization: `Bearer ${data.token}`,
'Content-Type': 'application/json',
};
const start = Date.now();
const res = http.post('https://api.example.com/checkout', JSON.stringify({
cart_id: `cart-${__VU}-${__ITER}`, // __VU = VU number, __ITER = iteration count
promo_code: 'LOAD_TEST',
}), { headers });
checkoutLatency.add(Date.now() - start);
checkoutCount.add(1);
const ok = check(res, {
'status 200': (r) => r.status === 200,
'order_id present': (r) => r.json('order_id') !== undefined,
});
checkoutErrors.add(!ok);
sleep(1); // think time between iterations (model real user pace)
}
export function teardown(data) {
// revoke the test token to leave the auth system clean
http.del('https://api.example.com/auth/token', null, {
headers: { Authorization: `Bearer ${data.token}` },
});
}
Stages vs. Scenarios: Choosing the Right Shape
The stages array is the quick way to define a single VU ramp profile. But real production traffic is not a single pool of identical users. The scenarios API gives you independent executor pools, each with its own VU count, ramp shape, arrival rate, and script function — composable into a realistic load model.
The key executors and when to use them:
ramping-vus — the classic ramp. You control VU count over time. Good for soak tests and spike drills. The default when you write stages.
constant-arrival-rate — you specify requests per second, not VU count. k6 spins up as many VUs as needed. Use this to model a fixed inbound request rate (e.g., 500 RPS from a load balancer) independently of how fast or slow your service responds. This is the correct executor for SLO gate tests — you want to assert behavior at a known RPS, not at an arbitrary VU count.
ramping-arrival-rate — like ramping-vus but in RPS. Good for finding the throughput cliff.
per-vu-iterations — each VU runs exactly N iterations. Useful for data-driven tests where each VU needs a unique dataset row.
Three independent k6 scenario executors composing a realistic production traffic mix: high-volume reads at constant RPS, ramping writes, and low-rate admin polling with a delayed start.
Thresholds: Making Tests Self-Enforcing
A load test without thresholds is just data collection. Thresholds are the executable SLO: k6 exits with a non-zero status code if any threshold is breached, which means your CI pipeline fails and the release is blocked. This is the most important feature k6 offers — it turns a performance test into a correctness gate.
Threshold expressions support any built-in or custom metric with operators p(N), avg, min, max, rate, count:
'p(95)<500' — 95th-percentile response time under 500ms
'p(99)<2000' — 99th-percentile under 2s (the long-tail SLO)
'rate<0.01' — less than 1% error rate on a Rate metric
'count>1000' — at least 1,000 successful completions (useful for data-coverage assertions)
You can attach an abortOnFail: true flag and a delayAbortEval duration to a threshold so k6 kills the test early once you know it is already failing — avoiding burning load on a system that is already down.
export const options = {
thresholds: {
// Standard SLO gates (fail CI if breached)
http_req_duration: [
{ threshold: 'p(95)<500', abortOnFail: true, delayAbortEval: '1m' },
{ threshold: 'p(99)<2000' },
],
http_req_failed: [
{ threshold: 'rate<0.005', abortOnFail: true, delayAbortEval: '30s' },
],
// Custom metric thresholds — per-endpoint breakdown
'http_req_duration{url:https://api.example.com/checkout}': ['p(95)<600'],
'http_req_duration{url:https://api.example.com/catalog}': ['p(95)<150'],
// Custom business metric
checkout_error_rate: ['rate<0.01'],
},
// noVUConnectionReuse: false — default; keep this false for realistic keep-alive behavior
// insecureSkipTLSVerify: false — never set true in real tests; you want to catch cert issues
};
Tag your HTTP requests by name, not URL. When a URL contains dynamic IDs like /orders/12345, k6 creates a separate metric for every unique URL. Your dashboard becomes noise. Use the { tags: { name: 'GET /orders/:id' } } option on each request, or set a URL grouping pattern with http.url`https://api.example.com/orders/${orderId}`. This is critical for threshold targeting and Grafana dashboards to be meaningful.
Running k6: Local, Distributed, and in CI
For local development and debugging, a single-machine run is all you need. For load levels above roughly 2,000–5,000 VUs (the typical single-machine ceiling depending on test complexity and network stack), you distribute across multiple nodes with k6 run --execution-segment or the Kubernetes operator.
# --- Local run ---
k6 run --vus 50 --duration 5m checkout-flow.js
# Pass secrets via environment (never bake into script)
k6 run -e API_SECRET=$API_SECRET checkout-flow.js
# --- Output to InfluxDB + Grafana dashboard (standard SRE setup) ---
k6 run --out influxdb=http://influxdb:8086/k6 checkout-flow.js
# --- Output to Prometheus remote-write (modern stack) ---
K6_PROMETHEUS_RW_SERVER_URL=http://prometheus:9090/api/v1/write \
k6 run --out experimental-prometheus-rw checkout-flow.js
# --- Distributed run across 3 nodes (each handles 1/3 of VUs) ---
# Node 1:
k6 run --execution-segment "0:1/3" --execution-segment-sequence "0,1/3,2/3,1" checkout-flow.js
# Node 2:
k6 run --execution-segment "1/3:2/3" --execution-segment-sequence "0,1/3,2/3,1" checkout-flow.js
# Node 3:
k6 run --execution-segment "2/3:1" --execution-segment-sequence "0,1/3,2/3,1" checkout-flow.js
# --- GitHub Actions CI gate (full pipeline) ---
# .github/workflows/perf.yml (fragment)
# - name: Run k6 load test
# uses: grafana/k6-action@v0.3.1
# with:
# filename: tests/load/checkout-flow.js
# flags: --out influxdb=http://influxdb:8086/k6
# env:
# API_SECRET: ${{ secrets.API_SECRET }}
Do not load-test production from a single laptop. If your internet connection has 50 Mbps upload and your API responses are 10 KB each, you hit a network ceiling at ~500 RPS before the server is stressed at all. Run load generators from inside the same VPC as the target, on machines with sufficient network bandwidth. The results from a network-bottlenecked test are meaningless — they measure your connection, not your service.
Realistic Data: Avoiding the Cache-Warming Trap
A load test that hammers a single product ID will warm your Redis cache on the first request and measure cache-hit latency for the remaining 99.9% of iterations. That tells you nothing about uncached-path performance. Production traffic hits thousands of distinct IDs. Use SharedArray to load a realistic dataset once (not per-VU) and spread the load across all IDs.
import { SharedArray } from 'k6/data';
import { randomItem } from 'https://jslib.k6.io/k6-utils/1.4.0/index.js';
// SharedArray is loaded ONCE at init time and shared read-only across all VUs
// No per-VU memory overhead — critical when running 10k+ VUs
const products = new SharedArray('products', function () {
return JSON.parse(open('./data/products.json')); // 10,000 product IDs
});
const users = new SharedArray('users', function () {
return JSON.parse(open('./data/users.json')); // 5,000 test user accounts
});
export default function () {
const product = randomItem(products);
const user = randomItem(users);
const res = http.get(`https://api.example.com/products/${product.id}`, {
tags: { name: 'GET /products/:id' }, // group metric regardless of ID
});
check(res, { 'status 200': (r) => r.status === 200 });
sleep(Math.random() * 2 + 0.5); // random think time 0.5–2.5s — not a fixed 1s
}
Common Production Failure Modes
Knowing the failure patterns will save you from spending hours on invalid test results:
Coordinated omission: If your VU sleeps while waiting for a slow response, the next iteration starts later — and slow responses appear less often in your percentiles. Use constant-arrival-rate executor to decouple arrival rate from service latency and measure the true queuing behavior.
TLS handshake overhead dominating: Short-duration tests (under 2 minutes) with high VU counts can show artificially high latency because TLS handshakes dominate. Ensure http.setResponseCallback is not counting connection setup in your business-logic metric, and run tests long enough for connection pools to stabilize.
DNS resolution bottleneck: When every VU resolves DNS independently, a thousand VUs can DOS your internal DNS server. Use --dns ttl=60s to cache DNS for the test duration.
Memory leak revealed by long soak: A 5-minute stress test passes; a 2-hour soak exposes a slow memory leak in your service's connection pool. Always run a soak test before any major release.