GitHub Actions in Depth

Caching & Artifacts

18 min Lesson 5 of 30

Caching & Artifacts

A Node.js project with a fat node_modules can spend three minutes on npm ci every single run — even when nothing in package-lock.json changed. Multiply that across 50 pull requests a day and you are burning 150 minutes of runner time (and money) on network I/O alone. GitHub Actions ships two complementary mechanisms to fix this: caching (reuse file trees across runs of the same branch) and artifacts (pass build outputs between jobs inside a single workflow run). Understanding when to use each — and how they fail — is a foundational production skill.

Caching with actions/cache

The actions/cache action stores a compressed archive on GitHub's cache backend and restores it on subsequent runs. The critical concept is the cache key: a string that uniquely identifies a cache entry. If the key matches, the cache is restored (cache hit); if not, the step is skipped and the job runs fresh (cache miss), after which the action saves a new entry under that key.

A good cache key has two parts: a stable prefix (identifying the OS and purpose) and a content hash of the lockfile or manifest. The hash ensures the cache is invalidated exactly when dependencies change — no more, no less.

name: CI with Dependency Cache

on: [push, pull_request]

jobs:
  build:
    runs-on: ubuntu-24.04
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-node@v4
        with:
          node-version: '22'

      # Restore cache before installing dependencies
      - name: Cache node_modules
        uses: actions/cache@v4
        id: npm-cache
        with:
          path: ~/.npm          # npm's global cache dir, not node_modules
          key: ${{ runner.os }}-npm-${{ hashFiles('**/package-lock.json') }}
          restore-keys: |
            ${{ runner.os }}-npm-

      - name: Install dependencies
        run: npm ci

      - name: Run tests
        run: npm test

restore-keys: The Fallback Chain

The restore-keys field is a newline-separated list of key prefixes tried in order when the primary key misses. GitHub finds the most-recent cache whose key starts with a given prefix and restores it as a partial hit. After the job completes, a new cache entry is saved under the exact primary key. This pattern means a newly created branch always gets a warm start from its parent's cache rather than a cold install.

Cache scope rules: Caches are scoped to a branch. A run on a feature branch can read caches from the default branch (main), but a run on main cannot read caches from feature branches. Plan your restore-keys accordingly — a prefix that matches main caches is a solid fallback.

What to Cache (and What Not To)

Cache the package manager's download cache, not the extracted node_modules directory itself. Caching node_modules is a common mistake: it is OS- and Node-version-specific, it contains compiled native addons, and it is slow to archive because it contains tens of thousands of small files. The correct path for each ecosystem:

npm: ~/.npm — the global tarball cache
pip / poetry: ~/.cache/pip or the virtualenv directory
Maven: ~/.m2/repository
Gradle: ~/.gradle/caches
Go modules: ~/go/pkg/mod
Cargo (Rust): ~/.cargo/registry and target/ (with caution — target/ is large)

Built-in caching in setup actions: actions/setup-node@v4, actions/setup-python@v5, actions/setup-java@v4, and similar actions all accept a cache: input (e.g., cache: 'npm') that internally uses actions/cache with the correct paths and key strategy. Prefer these over rolling your own cache steps — they handle edge cases you will forget.

Cache Limits and Eviction

GitHub imposes a 10 GB total cache limit per repository. Entries not accessed in seven days are evicted automatically; when the limit is reached, older entries are pruned first. In practice this means you should keep keys tight and avoid caching build outputs that change every run (those belong in artifacts). For monorepos with many distinct lock files, namespace your keys by workspace: monorepo-frontend-${{ hashFiles('apps/frontend/package-lock.json') }}.

Cache hit path (left) skips installation; miss path (right) runs fresh and saves a new entry for next time.

Artifacts: Passing Data Between Jobs

Unlike caches, artifacts are scoped to a single workflow run. Their purpose is different: move a compiled binary, a test report, or a Docker image tarball from a build job to a downstream test or deploy job. Jobs run on separate ephemeral runners, so without artifacts there is no shared filesystem between them.

name: Build then Deploy

on:
  push:
    branches: [main]

jobs:
  build:
    runs-on: ubuntu-24.04
    steps:
      - uses: actions/checkout@v4

      - name: Build application
        run: |
          npm ci
          npm run build          # outputs to dist/

      - name: Upload dist artifact
        uses: actions/upload-artifact@v4
        with:
          name: dist-${{ github.sha }}
          path: dist/
          retention-days: 7      # keep for 7 days; default is 90

  deploy:
    runs-on: ubuntu-24.04
    needs: build                 # waits for build job to succeed
    steps:
      - name: Download dist artifact
        uses: actions/download-artifact@v4
        with:
          name: dist-${{ github.sha }}
          path: dist/

      - name: Deploy to server
        run: rsync -az dist/ user@prod:/var/www/app/

Artifact Naming, Retention, and Size Limits

Use a content-addressing scheme for artifact names — embedding ${{ github.sha }} or the run ID prevents accidental cross-run collisions in concurrent pipelines. The default retention is 90 days for public repos and configurable in organization settings; set a tight retention-days for ephemeral build outputs to avoid hitting the storage quota. GitHub imposes a 500 MB per artifact and 2 GB per run upload limit (exact numbers depend on your plan). For large Docker images, push to a registry instead and pass the tag string as an artifact.

Never store secrets in artifacts. Artifact archives are downloadable by anyone with read access to the repository. This includes temporary credentials, API keys embedded in build outputs, and .env files. Strip secrets before uploading; for genuinely sensitive binaries, sign them and push to a private registry.

Matrix Builds + Artifacts: Gathering Results

When a matrix job produces per-platform outputs, each matrix cell should upload with a unique name embedding the matrix variable, then a downstream job downloads all of them and assembles the final result — a release bundle, a coverage merge, or a multi-platform Docker manifest.

jobs:
  test:
    strategy:
      matrix:
        os: [ubuntu-24.04, windows-latest, macos-14]
    runs-on: ${{ matrix.os }}
    steps:
      - uses: actions/checkout@v4
      - run: npm ci && npm test -- --reporter=json > results.json

      - uses: actions/upload-artifact@v4
        with:
          name: test-results-${{ matrix.os }}
          path: results.json

  report:
    runs-on: ubuntu-24.04
    needs: test
    steps:
      - uses: actions/download-artifact@v4
        with:
          pattern: test-results-*   # downloads ALL matching artifacts
          merge-multiple: true      # places files flat in the CWD
          path: all-results/

      - name: Merge and publish report
        run: node merge-reports.js all-results/

Production pattern — separate cache from artifact concerns: Use actions/cache for anything that is reproducible (dependencies, compiler caches, Docker layer caches) and actions/upload-artifact for anything that is the product of a specific run (binaries, test reports, coverage HTML). Treat the cache as an optimization you can safely delete; treat artifacts as correctness-critical handoffs between jobs.

Common Failure Modes

Stale cache after OS upgrade: If you update from ubuntu-22.04 to ubuntu-24.04 and your cache key does not include runner.os, you will restore an incompatible binary cache. Always prefix keys with ${{ runner.os }}.
Cache poisoning via PR from fork: Forked PRs can read but not write caches to the base repo. This is a security boundary, not a bug — but it means the first run from a fork always cold-installs. Accept this; do not try to work around it with explicit tokens.
Artifact download before upload: If job B downloads an artifact produced by job A, but you forgot needs: [job-a], the download step fails with a cryptic "artifact not found." Always declare explicit needs for artifact consumers.
Uploading the wrong path: path: dist/ uploads the contents of dist/ without the directory; path: dist (no trailing slash) includes the directory itself. Know which you want before you write the deployment step.