Performance & Load Testing

k6 in Practice

18 min Lesson 3 of 28

k6 in Practice

Lesson 2 established load testing theory — virtual users, ramp shapes, percentile math. This lesson moves to the tool you will spend the most time in: k6. Originally built by Load Impact, now a Grafana Labs project, k6 is the industry standard for developer-owned load testing. It is written in Go (so it can hold hundreds of thousands of VUs with modest RAM), scripted in JavaScript (ES2015+), and designed from the ground up to live inside a CI pipeline. At Grafana, Shopify, and many Tier-1 SRE teams, k6 scripts are versioned alongside service code — every PR gate includes a smoke test and every release candidate runs a full soak.

k6 is not a browser automation tool. It generates HTTP/WebSocket/gRPC traffic at the protocol level. It does not execute JavaScript in a browser. If you need real-browser load testing (for SPAs that do heavy client-side rendering) use k6 Browser (xk6-browser extension). For pure API and backend load testing, the default engine is what you want.

Script Structure: the Anatomy of a k6 Test

Every k6 script exports a default function that is the VU body — the code each virtual user executes in a loop. The script also has an init context (module-level code) that runs once per VU before the test starts, and optional lifecycle hooks: setup() (runs once before all VUs start) and teardown(data) (runs once after all VUs finish).

// checkout-flow.js — production-grade k6 script skeleton
import http from 'k6/http';
import { check, sleep } from 'k6';
import { Trend, Rate, Counter } from 'k6/metrics';

// --- Custom metrics (defined at init time, shared across VUs) ---
const checkoutLatency = new Trend('checkout_latency_ms', true); // true = high-resolution
const checkoutErrors  = new Rate('checkout_error_rate');
const checkoutCount   = new Counter('checkouts_attempted');

// --- Thresholds and stages (the test contract) ---
export const options = {
  stages: [
    { duration: '2m', target: 50  },   // ramp up to 50 VUs
    { duration: '5m', target: 50  },   // hold steady load
    { duration: '2m', target: 200 },   // spike to 200 VUs
    { duration: '5m', target: 200 },   // hold spike
    { duration: '2m', target: 0   },   // ramp down
  ],
  thresholds: {
    http_req_duration:    ['p(95)<500', 'p(99)<1500'],  // SLO gate
    checkout_error_rate:  ['rate<0.01'],                   // <1% errors
    http_req_failed:      ['rate<0.005'],                  // k6 built-in failure rate
  },
};

// setup() runs ONCE before VUs start; its return value is passed to default() and teardown()
export function setup() {
  const res = http.post('https://api.example.com/auth/token', JSON.stringify({
    client_id: 'load-test-bot',
    client_secret: __ENV.API_SECRET,   // inject secrets via env, never hardcode
  }), { headers: { 'Content-Type': 'application/json' } });

  check(res, { 'auth OK': (r) => r.status === 200 });
  return { token: res.json('access_token') };
}

// default() is the VU loop — called repeatedly for each VU
export default function (data) {
  const headers = {
    Authorization: `Bearer ${data.token}`,
    'Content-Type': 'application/json',
  };

  const start = Date.now();
  const res = http.post('https://api.example.com/checkout', JSON.stringify({
    cart_id: `cart-${__VU}-${__ITER}`,  // __VU = VU number, __ITER = iteration count
    promo_code: 'LOAD_TEST',
  }), { headers });

  checkoutLatency.add(Date.now() - start);
  checkoutCount.add(1);

  const ok = check(res, {
    'status 200':      (r) => r.status === 200,
    'order_id present': (r) => r.json('order_id') !== undefined,
  });
  checkoutErrors.add(!ok);

  sleep(1);   // think time between iterations (model real user pace)
}

export function teardown(data) {
  // revoke the test token to leave the auth system clean
  http.del('https://api.example.com/auth/token', null, {
    headers: { Authorization: `Bearer ${data.token}` },
  });
}

Stages vs. Scenarios: Choosing the Right Shape

The stages array is the quick way to define a single VU ramp profile. But real production traffic is not a single pool of identical users. The scenarios API gives you independent executor pools, each with its own VU count, ramp shape, arrival rate, and script function — composable into a realistic load model.

The key executors and when to use them:

ramping-vus — the classic ramp. You control VU count over time. Good for soak tests and spike drills. The default when you write stages.
constant-arrival-rate — you specify requests per second, not VU count. k6 spins up as many VUs as needed. Use this to model a fixed inbound request rate (e.g., 500 RPS from a load balancer) independently of how fast or slow your service responds. This is the correct executor for SLO gate tests — you want to assert behavior at a known RPS, not at an arbitrary VU count.
ramping-arrival-rate — like ramping-vus but in RPS. Good for finding the throughput cliff.
per-vu-iterations — each VU runs exactly N iterations. Useful for data-driven tests where each VU needs a unique dataset row.

// multi-scenario.js — composing read traffic + write traffic + admin traffic
export const options = {
  scenarios: {
    // Scenario 1: high-volume read traffic at constant arrival rate
    browse_products: {
      executor: 'constant-arrival-rate',
      rate: 300,            // 300 RPS
      timeUnit: '1s',
      duration: '10m',
      preAllocatedVUs: 50,  // pre-allocate to avoid cold-start latency
      maxVUs: 200,          // allow k6 to auto-scale if 300 RPS needs more VUs
      exec: 'browseFlow',   // points to an exported function in this file
    },

    // Scenario 2: lower-volume write traffic
    create_orders: {
      executor: 'ramping-arrival-rate',
      startRate: 10,
      timeUnit: '1s',
      stages: [
        { target: 10,  duration: '2m' },
        { target: 50,  duration: '5m' },
        { target: 10,  duration: '2m' },
      ],
      preAllocatedVUs: 20,
      maxVUs: 100,
      exec: 'checkoutFlow',
    },

    // Scenario 3: admin polling at low constant rate
    admin_reports: {
      executor: 'constant-vus',
      vus: 5,
      duration: '10m',
      exec: 'adminFlow',
      startTime: '30s',     // start 30s after other scenarios to let warm-up finish
    },
  },

  thresholds: {
    'http_req_duration{scenario:browse_products}': ['p(95)<200'],
    'http_req_duration{scenario:create_orders}':   ['p(95)<500'],
    'http_req_failed':                             ['rate<0.005'],
  },
};

export function browseFlow()   { /* ... */ }
export function checkoutFlow() { /* ... */ }
export function adminFlow()    { /* ... */ }

Three independent k6 scenario executors composing a realistic production traffic mix: high-volume reads at constant RPS, ramping writes, and low-rate admin polling with a delayed start.

Thresholds: Making Tests Self-Enforcing

A load test without thresholds is just data collection. Thresholds are the executable SLO: k6 exits with a non-zero status code if any threshold is breached, which means your CI pipeline fails and the release is blocked. This is the most important feature k6 offers — it turns a performance test into a correctness gate.

Threshold expressions support any built-in or custom metric with operators p(N), avg, min, max, rate, count:

'p(95)<500' — 95th-percentile response time under 500ms
'p(99)<2000' — 99th-percentile under 2s (the long-tail SLO)
'rate<0.01' — less than 1% error rate on a Rate metric
'count>1000' — at least 1,000 successful completions (useful for data-coverage assertions)

You can attach an abortOnFail: true flag and a delayAbortEval duration to a threshold so k6 kills the test early once you know it is already failing — avoiding burning load on a system that is already down.

export const options = {
  thresholds: {
    // Standard SLO gates (fail CI if breached)
    http_req_duration: [
      { threshold: 'p(95)<500', abortOnFail: true, delayAbortEval: '1m' },
      { threshold: 'p(99)<2000' },
    ],
    http_req_failed: [
      { threshold: 'rate<0.005', abortOnFail: true, delayAbortEval: '30s' },
    ],

    // Custom metric thresholds — per-endpoint breakdown
    'http_req_duration{url:https://api.example.com/checkout}': ['p(95)<600'],
    'http_req_duration{url:https://api.example.com/catalog}':  ['p(95)<150'],

    // Custom business metric
    checkout_error_rate: ['rate<0.01'],
  },

  // noVUConnectionReuse: false — default; keep this false for realistic keep-alive behavior
  // insecureSkipTLSVerify: false — never set true in real tests; you want to catch cert issues
};

Tag your HTTP requests by name, not URL. When a URL contains dynamic IDs like /orders/12345, k6 creates a separate metric for every unique URL. Your dashboard becomes noise. Use the { tags: { name: 'GET /orders/:id' } } option on each request, or set a URL grouping pattern with http.url`https://api.example.com/orders/${orderId}`. This is critical for threshold targeting and Grafana dashboards to be meaningful.

Running k6: Local, Distributed, and in CI

For local development and debugging, a single-machine run is all you need. For load levels above roughly 2,000–5,000 VUs (the typical single-machine ceiling depending on test complexity and network stack), you distribute across multiple nodes with k6 run --execution-segment or the Kubernetes operator.

# --- Local run ---
k6 run --vus 50 --duration 5m checkout-flow.js

# Pass secrets via environment (never bake into script)
k6 run -e API_SECRET=$API_SECRET checkout-flow.js

# --- Output to InfluxDB + Grafana dashboard (standard SRE setup) ---
k6 run --out influxdb=http://influxdb:8086/k6 checkout-flow.js

# --- Output to Prometheus remote-write (modern stack) ---
K6_PROMETHEUS_RW_SERVER_URL=http://prometheus:9090/api/v1/write \
  k6 run --out experimental-prometheus-rw checkout-flow.js

# --- Distributed run across 3 nodes (each handles 1/3 of VUs) ---
# Node 1:
k6 run --execution-segment "0:1/3"   --execution-segment-sequence "0,1/3,2/3,1" checkout-flow.js
# Node 2:
k6 run --execution-segment "1/3:2/3" --execution-segment-sequence "0,1/3,2/3,1" checkout-flow.js
# Node 3:
k6 run --execution-segment "2/3:1"   --execution-segment-sequence "0,1/3,2/3,1" checkout-flow.js

# --- GitHub Actions CI gate (full pipeline) ---
# .github/workflows/perf.yml (fragment)
# - name: Run k6 load test
#   uses: grafana/k6-action@v0.3.1
#   with:
#     filename: tests/load/checkout-flow.js
#     flags: --out influxdb=http://influxdb:8086/k6
#   env:
#     API_SECRET: ${{ secrets.API_SECRET }}

Do not load-test production from a single laptop. If your internet connection has 50 Mbps upload and your API responses are 10 KB each, you hit a network ceiling at ~500 RPS before the server is stressed at all. Run load generators from inside the same VPC as the target, on machines with sufficient network bandwidth. The results from a network-bottlenecked test are meaningless — they measure your connection, not your service.

Realistic Data: Avoiding the Cache-Warming Trap

A load test that hammers a single product ID will warm your Redis cache on the first request and measure cache-hit latency for the remaining 99.9% of iterations. That tells you nothing about uncached-path performance. Production traffic hits thousands of distinct IDs. Use SharedArray to load a realistic dataset once (not per-VU) and spread the load across all IDs.

import { SharedArray } from 'k6/data';
import { randomItem } from 'https://jslib.k6.io/k6-utils/1.4.0/index.js';

// SharedArray is loaded ONCE at init time and shared read-only across all VUs
// No per-VU memory overhead — critical when running 10k+ VUs
const products = new SharedArray('products', function () {
  return JSON.parse(open('./data/products.json'));  // 10,000 product IDs
});

const users = new SharedArray('users', function () {
  return JSON.parse(open('./data/users.json'));     // 5,000 test user accounts
});

export default function () {
  const product = randomItem(products);
  const user    = randomItem(users);

  const res = http.get(`https://api.example.com/products/${product.id}`, {
    tags: { name: 'GET /products/:id' },  // group metric regardless of ID
  });

  check(res, { 'status 200': (r) => r.status === 200 });
  sleep(Math.random() * 2 + 0.5);  // random think time 0.5–2.5s — not a fixed 1s
}

Common Production Failure Modes

Knowing the failure patterns will save you from spending hours on invalid test results:

Coordinated omission: If your VU sleeps while waiting for a slow response, the next iteration starts later — and slow responses appear less often in your percentiles. Use constant-arrival-rate executor to decouple arrival rate from service latency and measure the true queuing behavior.
TLS handshake overhead dominating: Short-duration tests (under 2 minutes) with high VU counts can show artificially high latency because TLS handshakes dominate. Ensure http.setResponseCallback is not counting connection setup in your business-logic metric, and run tests long enough for connection pools to stabilize.
DNS resolution bottleneck: When every VU resolves DNS independently, a thousand VUs can DOS your internal DNS server. Use --dns ttl=60s to cache DNS for the test duration.
Memory leak revealed by long soak: A 5-minute stress test passes; a 2-hour soak exposes a slow memory leak in your service's connection pool. Always run a soak test before any major release.