GitHub Actions in Depth

Matrix Builds & Parallelism

18 min Lesson 4 of 30

Matrix Builds & Parallelism

At Google, Meta, and Amazon, CI pipelines routinely test the same code against a dozen Node.js versions, three operating systems, and two build flags — simultaneously. GitHub Actions makes this possible with the matrix strategy: a single job definition that expands at runtime into a fan of parallel jobs. Understanding matrix builds is the difference between a 40-minute sequential pipeline and a 6-minute parallel one.

What Is a Matrix Strategy?

A matrix is a map of variables defined under strategy.matrix. GitHub Actions computes the Cartesian product of every dimension you define and spawns one job per combination. Each job receives its row's values through the matrix context — for example ${{ matrix.os }} or ${{ matrix.node }}.

The example below creates six parallel jobs (3 OS × 2 Node versions):

name: Multi-version CI

on: [push, pull_request]

jobs:
  test:
    strategy:
      matrix:
        os: [ubuntu-24.04, windows-latest, macos-14]
        node: ['20', '22']
    runs-on: ${{ matrix.os }}

    steps:
      - uses: actions/checkout@v4

      - name: Setup Node.js ${{ matrix.node }}
        uses: actions/setup-node@v4
        with:
          node-version: ${{ matrix.node }}
          cache: 'npm'

      - run: npm ci
      - run: npm test

Cartesian product: every element of os is paired with every element of node. With 3 OS values and 2 Node versions you get 3 × 2 = 6 concurrent jobs. Adding a third dimension of 4 build flags would produce 3 × 2 × 4 = 24 jobs. Respect the usage limits (256 jobs per matrix on GitHub-hosted runners).

Includes and Excludes

The Cartesian product is rarely what you want exactly. GitHub Actions provides two escape hatches:

include — injects extra jobs or augments existing combinations with additional variables.
exclude — drops specific combinations from the product.

jobs:
  build:
    strategy:
      matrix:
        os: [ubuntu-24.04, windows-latest]
        node: ['18', '20', '22']
        # Exclude a combination that is known broken
        exclude:
          - os: windows-latest
            node: '18'
        # Add a one-off combination that does not fit the product
        # and attach an extra variable to it
        include:
          - os: ubuntu-24.04
            node: '22'
            experimental: true
          - os: macos-14
            node: '22'
    runs-on: ${{ matrix.os }}
    continue-on-error: ${{ matrix.experimental == true }}
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: ${{ matrix.node }}
      - run: npm ci && npm test

The include block can attach any extra key to a combination — here experimental: true — which you then reference in expressions like continue-on-error. This is a standard pattern at big-tech companies to let canary builds fail without blocking the overall job status.

fail-fast

By default, fail-fast: true is set on every matrix. As soon as any combination fails, GitHub Actions cancels all remaining in-progress jobs. This is the right default for developer feedback loops — you want to know quickly whether something is broken rather than waiting for 23 of 24 jobs to finish.

Set fail-fast: false when you need the full picture — for example in release pipelines that test against every supported platform and must report the exact set of failing environments before cutting a release.

strategy:
  fail-fast: false   # run all combinations; collect the complete failure set
  matrix:
    os: [ubuntu-24.04, windows-latest, macos-14]
    python: ['3.10', '3.11', '3.12', '3.13']

Production pattern: Use fail-fast: true on feature-branch CI (fast developer feedback) and fail-fast: false on release-branch or nightly builds (complete regression coverage).

Concurrency Groups

Parallelism within a workflow is great — but you also need to control parallelism across workflow runs. Without concurrency control, every push to a PR triggers a new run, and ten stale runs from previous commits pile up and consume minutes needlessly.

The concurrency key defines a group. GitHub Actions guarantees that only one run per group is active at a time:

name: PR CI

on:
  push:
    branches: ['**']
  pull_request:

# One active run per branch. Cancel the in-progress run when a new commit arrives.
concurrency:
  group: ci-${{ github.ref }}
  cancel-in-progress: true

jobs:
  test:
    strategy:
      matrix:
        node: ['20', '22']
    runs-on: ubuntu-24.04
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: ${{ matrix.node }}
      - run: npm ci && npm test

With cancel-in-progress: true, the moment a new push arrives on the same branch, every running job in the group is cancelled. This is the single biggest time-saver in real CI and is used universally at companies that push frequently.

Do not cancel deployment jobs. Set cancel-in-progress: false (or use a separate concurrency group) for jobs that write to production infrastructure. Cancelling a deployment mid-flight can leave resources in a partially-updated state. A common pattern is group: deploy-${{ github.ref }}-${{ github.sha }} with cancel-in-progress: false, which queues deploys but never cancels them.

How It All Fits Together

The diagram below shows a typical matrix CI run: a single workflow triggers six parallel test jobs (3 OS × 2 versions), all gated by a concurrency group that cancels stale runs on new pushes.

A matrix of 3 OS × 2 Node versions fans out to 6 parallel jobs across independent runners. A concurrency group cancels stale runs on new pushes.

max-parallel: Throttling the Fan-Out

Sometimes you do not want all combinations to run at once — for instance, if each job hammers a shared staging database or a rate-limited third-party API. Use max-parallel to cap concurrency:

strategy:
  max-parallel: 4    # at most 4 jobs running simultaneously
  matrix:
    env: [dev, staging, qa, perf, canary, prod-eu, prod-us]

With seven environments and max-parallel: 4, the first four start immediately; the remaining three are queued and start as slots free. This is critical for integration tests that share infra.

Real-World Production Patterns

Keep matrix dimensions small. A 5 × 5 × 5 matrix = 125 concurrent jobs that exhaust most org-level runner quotas. Reduce to the minimum set that actually catches bugs: usually 2 OS variants and 2 runtime versions.
Pin runner versions. ubuntu-latest changes silently. In production pipelines, pin to ubuntu-24.04 so a runner image upgrade does not surprise you mid-sprint.
Use outputs from matrix jobs carefully. Only the last-completing job's output survives. Aggregate results via an artifact or a downstream job with needs.
Concurrency on default branch. Never set cancel-in-progress: true on main or master pushes. Cancelling a main-branch build mid-way corrupts deployment pipelines. Scope cancellation to PR branches only.

Cost awareness: On GitHub-hosted runners, macOS minutes are billed at 10× the rate of Linux. A 3 OS × 10 version matrix with 30-minute jobs could consume 3 hours of macOS minutes per run. Prefer Linux-only matrices unless you genuinely need OS-specific coverage.