Security & Performance

Load Testing and Benchmarking

20 min Lesson 22 of 35

Load Testing and Benchmarking

Load testing simulates real-world traffic to identify performance bottlenecks before they affect users. It's essential for capacity planning and ensuring reliability.

Why Load Testing Matters

Load testing helps you:

Identify breaking points and maximum capacity
Find memory leaks and resource exhaustion
Validate auto-scaling configurations
Establish performance baselines
Prevent outages during traffic spikes

When to Test: Before major releases, after infrastructure changes, when expecting traffic surges (marketing campaigns, sales), and periodically to catch regressions.

Artillery - Modern Load Testing

Artillery is a powerful, developer-friendly load testing tool:

<!-- Install Artillery -->
npm install -g artillery

<!-- Quick test from command line -->
artillery quick --count 10 --num 100 https://example.com

<!-- Create test scenario (load-test.yml) -->
config:
  target: 'https://api.example.com'
  phases:
    - duration: 60
      arrivalRate: 5
      name: 'Warm-up'
    - duration: 300
      arrivalRate: 20
      name: 'Sustained load'
    - duration: 60
      arrivalRate: 50
      name: 'Spike test'
scenarios:
  - name: 'User journey'
    flow:
      - get:
          url: '/api/products'
      - think: 2
      - post:
          url: '/api/cart'
          json:
            product_id: 123
            quantity: 1
      - get:
          url: '/api/checkout'

<!-- Run the test -->
artillery run load-test.yml

<!-- Generate HTML report -->
artillery run --output report.json load-test.yml
artillery report report.json

Best Practice: Start with low load and gradually increase. Test individual endpoints before full user journeys. Always test against a staging environment, never production.

k6 - Performance Testing at Scale

k6 is a modern load testing tool built for CI/CD integration:

<!-- Install k6 -->
brew install k6  # macOS
sudo apt install k6  # Ubuntu

<!-- Create test script (load-test.js) -->
import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
  stages: [
    { duration: '30s', target: 20 },
    { duration: '1m30s', target: 100 },
    { duration: '30s', target: 0 },
  ],
  thresholds: {
    http_req_duration: ['p(95)<500'], // 95% of requests under 500ms
    http_req_failed: ['rate<0.01'],   // Error rate under 1%
  },
};

export default function () {
  const res = http.get('https://api.example.com/users');
  check(res, {
    'status is 200': (r) => r.status === 200,
    'response time OK': (r) => r.timings.duration < 500,
  });
  sleep(1);
}

<!-- Run the test -->
k6 run load-test.js

<!-- Run with cloud output -->
k6 run --out cloud load-test.js

Apache Benchmark (ab) - Quick Testing

ab is a simple command-line tool for basic load testing:

<!-- Basic load test: 1000 requests, 10 concurrent -->
ab -n 1000 -c 10 https://example.com/

<!-- With POST data -->
ab -n 100 -c 10 -p data.json -T application/json https://api.example.com/login

<!-- With authentication -->
ab -n 500 -c 20 -H "Authorization: Bearer token123" https://api.example.com/users

<!-- Output metrics: -->
<!-- Requests per second: throughput capacity -->
<!-- Time per request: average latency -->
<!-- Transfer rate: bandwidth usage -->
<!-- Percentage served within X ms: latency distribution -->

Warning: ab doesn't support HTTP/2 or complex scenarios. Use it for quick tests only, not comprehensive load testing.

Stress Testing

Stress testing pushes your system beyond normal capacity to find breaking points:

<!-- k6 stress test configuration -->
export const options = {
  stages: [
    { duration: '2m', target: 100 },   // Ramp up to normal load
    { duration: '5m', target: 100 },   // Stay at normal load
    { duration: '2m', target: 200 },   // Stress: double the load
    { duration: '5m', target: 200 },   // Maintain stress
    { duration: '2m', target: 300 },   // Extreme stress
    { duration: '5m', target: 300 },   // Breaking point
    { duration: '2m', target: 0 },     // Recovery
  ],
};

<!-- Monitor these during stress tests: -->
<!-- CPU usage: should not reach 100% sustained -->
<!-- Memory usage: watch for leaks -->
<!-- Response times: when do they degrade? -->
<!-- Error rates: what's the failure threshold? -->
<!-- Database connections: are you hitting pool limits? -->

Capacity Planning

Use load testing data to plan infrastructure scaling:

<!-- Calculate required capacity -->

// Current metrics from load test:
// - Peak traffic: 10,000 req/min
// - Current servers: 2
// - Response time at peak: 800ms (acceptable: 300ms)

// Calculation:
// Servers needed = (Current servers * Peak response time) / Target response time
// = (2 * 800) / 300 = 5.3 ≈ 6 servers

// Add 20-30% buffer for safety:
// Final capacity: 6 * 1.25 = 7-8 servers

<!-- Document your findings -->
/**
 * Load Test Results - 2024-02-16
 * 
 * Configuration: 2x t3.medium instances
 * Breaking point: 8,000 req/min
 * Bottleneck: Database connections (max 100)
 * Recommendation: Add read replicas + connection pooling
 * Expected improvement: 3x capacity (24,000 req/min)
 */

Remember: Performance degrades non-linearly. A system handling 1000 req/sec doesn't necessarily handle 2000 req/sec with 2x the hardware.

Bottleneck Identification

Combine load testing with monitoring to find bottlenecks:

<!-- Monitor during load tests -->

// Application metrics (via New Relic, Datadog):
// - Endpoint response times
// - Database query times
// - Cache hit rates
// - Background job queue depth

// Infrastructure metrics (via CloudWatch, Grafana):
// - CPU utilization per service
// - Memory usage and swapping
// - Network I/O and bandwidth
// - Disk I/O and IOPS

// Database metrics:
// - Active connections
// - Slow queries
// - Lock waits
// - Replication lag

<!-- Example: Identify slow endpoint -->
// Load test shows /api/reports taking 3000ms
// APM shows 2800ms in database query
// EXPLAIN shows missing index on reports.user_id
// Solution: Add index, retest → 200ms response time

Performance Baselines

Establish baselines to detect regressions:

<!-- Create baseline test suite -->

// baseline-tests.js (k6)
export const options = {
  thresholds: {
    // Baseline from previous successful test
    'http_req_duration{endpoint:home}': ['p(95)<200'],
    'http_req_duration{endpoint:api}': ['p(95)<500'],
    'http_req_duration{endpoint:search}': ['p(95)<1000'],
    'http_req_failed': ['rate<0.01'],
  },
};

<!-- Run in CI/CD pipeline -->
# .github/workflows/performance.yml
name: Performance Tests
on: [push, pull_request]
jobs:
  load-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Run k6 test
        uses: grafana/k6-action@v0.3.0
        with:
          filename: tests/baseline-tests.js
          cloud: true
          token: ${{ secrets.K6_CLOUD_TOKEN }}

Exercise: Set up Artillery or k6, create a load test for your application's most critical endpoint, run tests at 10, 50, 100, and 200 concurrent users. Document response times, error rates, and the point where performance degrades. Identify and fix one bottleneck.