Matrix Builds & Parallelism
Matrix Builds & Parallelism
At Google, Meta, and Amazon, CI pipelines routinely test the same code against a dozen Node.js versions, three operating systems, and two build flags — simultaneously. GitHub Actions makes this possible with the matrix strategy: a single job definition that expands at runtime into a fan of parallel jobs. Understanding matrix builds is the difference between a 40-minute sequential pipeline and a 6-minute parallel one.
What Is a Matrix Strategy?
A matrix is a map of variables defined under strategy.matrix. GitHub Actions computes the Cartesian product of every dimension you define and spawns one job per combination. Each job receives its row's values through the matrix context — for example ${{ matrix.os }} or ${{ matrix.node }}.
The example below creates six parallel jobs (3 OS × 2 Node versions):
os is paired with every element of node. With 3 OS values and 2 Node versions you get 3 × 2 = 6 concurrent jobs. Adding a third dimension of 4 build flags would produce 3 × 2 × 4 = 24 jobs. Respect the usage limits (256 jobs per matrix on GitHub-hosted runners).
Includes and Excludes
The Cartesian product is rarely what you want exactly. GitHub Actions provides two escape hatches:
include— injects extra jobs or augments existing combinations with additional variables.exclude— drops specific combinations from the product.
The include block can attach any extra key to a combination — here experimental: true — which you then reference in expressions like continue-on-error. This is a standard pattern at big-tech companies to let canary builds fail without blocking the overall job status.
fail-fast
By default, fail-fast: true is set on every matrix. As soon as any combination fails, GitHub Actions cancels all remaining in-progress jobs. This is the right default for developer feedback loops — you want to know quickly whether something is broken rather than waiting for 23 of 24 jobs to finish.
Set fail-fast: false when you need the full picture — for example in release pipelines that test against every supported platform and must report the exact set of failing environments before cutting a release.
fail-fast: true on feature-branch CI (fast developer feedback) and fail-fast: false on release-branch or nightly builds (complete regression coverage).
Concurrency Groups
Parallelism within a workflow is great — but you also need to control parallelism across workflow runs. Without concurrency control, every push to a PR triggers a new run, and ten stale runs from previous commits pile up and consume minutes needlessly.
The concurrency key defines a group. GitHub Actions guarantees that only one run per group is active at a time:
With cancel-in-progress: true, the moment a new push arrives on the same branch, every running job in the group is cancelled. This is the single biggest time-saver in real CI and is used universally at companies that push frequently.
cancel-in-progress: false (or use a separate concurrency group) for jobs that write to production infrastructure. Cancelling a deployment mid-flight can leave resources in a partially-updated state. A common pattern is group: deploy-${{ github.ref }}-${{ github.sha }} with cancel-in-progress: false, which queues deploys but never cancels them.
How It All Fits Together
The diagram below shows a typical matrix CI run: a single workflow triggers six parallel test jobs (3 OS × 2 versions), all gated by a concurrency group that cancels stale runs on new pushes.
max-parallel: Throttling the Fan-Out
Sometimes you do not want all combinations to run at once — for instance, if each job hammers a shared staging database or a rate-limited third-party API. Use max-parallel to cap concurrency:
With seven environments and max-parallel: 4, the first four start immediately; the remaining three are queued and start as slots free. This is critical for integration tests that share infra.
Real-World Production Patterns
- Keep matrix dimensions small. A 5 × 5 × 5 matrix = 125 concurrent jobs that exhaust most org-level runner quotas. Reduce to the minimum set that actually catches bugs: usually 2 OS variants and 2 runtime versions.
- Pin runner versions.
ubuntu-latestchanges silently. In production pipelines, pin toubuntu-24.04so a runner image upgrade does not surprise you mid-sprint. - Use
outputsfrom matrix jobs carefully. Only the last-completing job's output survives. Aggregate results via an artifact or a downstream job withneeds. - Concurrency on default branch. Never set
cancel-in-progress: trueonmainormasterpushes. Cancelling a main-branch build mid-way corrupts deployment pipelines. Scope cancellation to PR branches only.