Continuous Integration Fundamentals

Pipeline Speed at Big-Tech Scale

18 min Lesson 7 of 28

Pipeline Speed at Big-Tech Scale

Google, Meta, and Stripe all run thousands of CI pipeline executions every hour. At that volume, a two-minute improvement per run translates into engineering-years of saved developer time annually. But speed is not just about raw compute — it is about intelligent architecture: knowing what to cache, which tests to run, and how to sequence work so that humans are never blocked waiting for machines.

This lesson dissects the four levers that big-tech CI teams use to keep pipelines under ten minutes even as codebases grow into millions of lines: dependency caching, incremental builds, test selection, and merge queues.

Lever 1 — Dependency Caching

The single largest source of wasted CI time in most organizations is re-downloading and re-compiling dependencies that have not changed. A cold npm install on a large monorepo can take 4–8 minutes. A cache hit brings that to 15 seconds.

Every major CI platform exposes a key-value cache store. The key is a hash of the lockfile (package-lock.json, Gemfile.lock, go.sum, Cargo.lock). When the lockfile does not change — which is true for the vast majority of commits — the cache is restored verbatim and dependency installation is skipped entirely.

# GitHub Actions — robust dependency cache for Node.js
- name: Cache node_modules
  uses: actions/cache@v4
  with:
    path: |
      ~/.npm
      node_modules
    key: npm-${{ runner.os }}-${{ hashFiles('**/package-lock.json') }}
    restore-keys: |
      npm-${{ runner.os }}-

- name: Install dependencies (skipped on cache hit)
  run: npm ci --prefer-offline

The restore-keys fallback is critical: if the lockfile changed (e.g., a new package was added), there is no exact key match. The restore-key prefix finds the most recent cache that shares the same OS, giving you a partial hit that still avoids a full cold install. Only the newly added package is fetched from the registry.

Layer your caches. Cache build outputs (compiled TypeScript, Webpack chunks) with a key that hashes your source files plus the lockfile. This way, unchanged packages AND unchanged source produce zero rebuild work. Teams at Shopify report 60–70% cache hit rates in production, cutting median pipeline time nearly in half.

Lever 2 — Incremental Builds

Caching dependencies only eliminates one category of waste. The next is recompiling source code that has not changed. Incremental builds rely on a build graph: a directed acyclic graph of every source file and its compiled outputs. A correct build system only re-executes nodes whose inputs have changed or whose outputs are missing.

Nx (JavaScript/TypeScript) and Bazel (polyglot) are the two tools you will encounter most often at big-tech scale. Nx uses a computation cache keyed on the hash of source files and configuration; Bazel uses hermetic sandboxes and content-addressed storage so that identical inputs always produce identical outputs regardless of machine state.

# Nx — run only affected projects since the last merge to main
npx nx affected --target=build --base=origin/main --head=HEAD

# Nx Cloud shares the remote cache across all runners and developers
npx nx affected --target=test --base=origin/main \
  --head=HEAD \
  --parallel=4 \
  --configuration=ci

The affected command is the key primitive. Nx builds a dependency graph of your workspace and determines which projects are transitively affected by the diff between HEAD and the merge base. If you changed a shared utility library, every downstream app is marked affected. If you only changed a leaf service, only that service is rebuilt and retested. At Meta, internal tooling (Buck2) applies the same principle across a monorepo with hundreds of thousands of targets.

Hermeticity is the prerequisite. Incremental builds are only safe if your build is hermetic — identical inputs produce identical outputs every time. If your build reads the system clock, a random seed, or an unversioned system library, the output is non-deterministic and caching it produces flaky results. Bazel enforces hermeticity via sandboxing; Nx relies on you configuring inputs correctly.

Lever 3 — Test Selection

Running the full test suite on every commit is expensive and, more importantly, slow. Test selection narrows the set of tests executed to those that are likely to be affected by the code change. There are two dominant approaches:

Static dependency analysis: Map every test file to the source files it imports. If commit X touches src/billing/invoice.ts, only run tests that directly or transitively import that file. Tools like jest --changedSince, Nx affected:test, and Bazel's test filtering do this.
ML-based flakiness and relevance ranking: Google's TAP system uses historical test failure data to rank tests by their probability of catching the current diff. High-probability tests run in the first wave; low-probability tests run in a later shard or are deferred to nightly.

# Jest — only run tests related to changed files (uses git diff internally)
jest --changedSince=origin/main --passWithNoTests

# Pytest — run only tests that cover changed source lines (pytest-cov integration)
pytest --co -q  # collect only, print affected test IDs
pytest tests/ -k "invoice or billing"  # manual filter during triage

Never skip the full suite on protected branches. Test selection is safe on feature branches and pull requests. Before merging to main, run the complete suite — ideally inside a merge queue (see below) so that pre-merge tests run against the post-merge state, not the branch state. Skipping full suite on trunk is how flaky broken builds silently accumulate.

Lever 4 — Merge Queues

Merge queues solve the semantic conflict problem: two PRs that individually pass CI may conflict semantically when combined. Without a merge queue, both are approved, the second one to merge inherits a broken main branch, and the on-call engineer gets paged.

A merge queue serializes (or optimistically batches) merges. When a PR is approved and added to the queue, the queue system creates a temporary "merge candidate" branch: it stacks the PR on top of whatever is currently in the queue, runs CI against that combined state, and only performs the actual merge if CI passes. If it fails, only that candidate is ejected; PRs ahead of it in the queue are unaffected.

A merge queue runs CI against stacked "candidate" branches. Passing candidates are merged to main in order; failing candidates are ejected without blocking others.

GitHub's native merge queue (available since 2023), Mergify, and internal tools at Google (Submit Queue) and Stripe all implement this pattern. At Stripe, the submit queue processes thousands of merges per day, batching compatible PRs into groups to multiply CI throughput without sacrificing correctness.

# GitHub Actions — trigger CI from the merge_group event (required for merge queues)
on:
  push:
    branches: [main]
  pull_request:
    branches: [main]
  merge_group:          # <-- fires when the PR enters the queue

jobs:
  ci:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Build & Test
        run: make ci

The merge_group event is mandatory. If your required status checks only listen on pull_request, the merge queue cannot get a CI result for the candidate branch and will stall. Always add merge_group to your workflow triggers when enabling GitHub's merge queue.

Putting It Together — A Speed-Optimized Pipeline

A pipeline that applies all four levers looks like this: restore the dependency cache first (gate every subsequent step on the cache key), run Nx or Bazel incremental build, run affected tests only on PRs, run the full suite inside the merge queue candidate, and push a new cache entry only on a cache miss. Observed median times at scale: cold run under 8 minutes, warm run under 90 seconds.

The discipline that separates big-tech CI from average CI is treating pipeline time as a product metric. Every slowdown gets filed as a bug. Cache hit rates are dashboarded. Test durations are tracked per file over time. When a test starts taking 30 seconds instead of 3, it gets flagged for optimization before it compounds across thousands of daily runs.