Continuous Integration Fundamentals

Build Automation & Reproducibility

18 min Lesson 3 of 28

Build Automation & Reproducibility

A CI pipeline is only as trustworthy as the build it runs. Lesson 2 showed you the anatomy of a pipeline — stages, runners, triggers. This lesson goes one level deeper: how do you write the build scripts that those stages execute, pin dependencies so they never drift, and structure a build so it produces the exact same binary on a developer laptop, a GitHub Actions runner, and a production server? These three properties — automated builds, pinned dependencies, and hermetic builds — are the foundation that separates a hobby CI setup from a Google-grade one.

Why reproducibility matters at scale: At Google, every binary is bit-for-bit reproducible. This means a security team can reproduce any artifact that ever shipped, audit it for CVEs introduced after the fact, and rebuild it from source at any point in history. Netflix and Meta hold the same standard. Without reproducibility, you cannot reliably roll back, audit, or reason about what is running in production.

Build Scripts: The Contract Between Code and Pipeline

Every CI stage calls a script. That script is the authoritative, executable description of how your software is built. The golden rule: if a human runs it once manually, it must be expressible as a script that the pipeline runs automatically — with no interactive prompts, no ambient environment variables, and no implicit system-level dependencies.

Write your build scripts with these properties:

  • Exit on first failure: Use set -euo pipefail in Bash. Without it, a failing npm install is silently ignored and the pipeline reports green. -e exits on error, -u treats unset variables as errors, -o pipefail catches failures inside pipes.
  • Explicit tool versions: Never call node and hope for the best. Pin the version with .nvmrc, .tool-versions (asdf), or a Docker image tag. The runner that installs Node 18 today may upgrade to Node 22 next quarter.
  • Offline-first installs: Use npm ci instead of npm install. Use pip install --no-index when a mirror is available. Use go mod download with a module proxy. Network failures in CI are a top source of flaky builds.
  • Idempotent steps: Each step should be safe to re-run. Avoid side effects like appending to files or mutating shared state between steps.
#!/usr/bin/env bash # build.sh — production-grade build script for a Node.js service set -euo pipefail readonly ARTIFACT_DIR="dist" readonly REQUIRED_NODE="20" # 1. Verify the environment is correct ACTUAL_NODE=$(node --version | grep -oP '\d+' | head -1) if [[ "$ACTUAL_NODE" != "$REQUIRED_NODE" ]]; then echo "ERROR: Expected Node ${REQUIRED_NODE}, got ${ACTUAL_NODE}" >&2 exit 1 fi # 2. Install dependencies exactly from lockfile — no network drift npm ci --prefer-offline # 3. Generate code (protobuf, OpenAPI, i18n) before compiling npm run codegen # 4. Type-check, lint, and test in parallel where possible npm run type-check & npm run lint & wait # fail fast if either background job fails npm run test:unit -- --ci --runInBand # 5. Production build with reproducible output NODE_ENV=production npm run build # 6. Capture artifact size (fail if bundle exceeds 500 KB) BUNDLE_KB=$(du -sk "$ARTIFACT_DIR" | cut -f1) if (( BUNDLE_KB > 500 )); then echo "ERROR: Bundle too large — ${BUNDLE_KB} KB (limit: 500 KB)" >&2 exit 1 fi echo "Build complete. Artifact: ${ARTIFACT_DIR}/ (${BUNDLE_KB} KB)"
Put the build script in the repo, not in the CI YAML. A 200-line shell script in a GitHub Actions run: block is untestable and cannot be run locally. Put it in scripts/build.sh and call it from the YAML with a single line. Engineers can now run the exact same build locally with bash scripts/build.sh.

Pinned Dependencies: No Surprises in Production

Unpinned dependencies are the most common source of "it worked last week" failures. The pattern repeats across every ecosystem: a transitive package releases a patch, your lock file is absent or stale, and suddenly your build fails on a perfectly unchanged codebase. At big tech, every dependency — direct and transitive — is pinned exactly.

What pinning means per ecosystem:

  • Node.js: Commit package-lock.json or yarn.lock. Install with npm ci (errors if lockfile is out of sync with package.json). Never commit node_modules/.
  • Python: Commit requirements.txt generated by pip-compile (from pip-tools), which resolves and pins all transitive dependencies. Or use poetry.lock. Never use requirements.txt with open version ranges like requests>=2.0.
  • Go: Commit go.sum. The Go toolchain verifies checksums cryptographically — any tampered dependency fails the build. Run go mod tidy before committing to keep it clean.
  • Docker base images: Never use FROM ubuntu:latest. Pin to a digest: FROM ubuntu:24.04@sha256:abc123.... Image tags are mutable — the same tag can point to a different layer set tomorrow.
  • CI actions: In GitHub Actions, pin third-party actions to a commit SHA, not a tag. uses: actions/checkout@v4 is mutable; uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 is pinned forever.
# .github/workflows/ci.yml — pinned dependencies at every layer name: CI on: push: branches: [main] pull_request: jobs: build: runs-on: ubuntu-24.04 # pin the runner OS version steps: # Pin to a specific commit SHA, not a mutable tag - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 - uses: actions/setup-node@39370e3970a6d050c480ffad4ff0ed4d3fdee5af with: node-version-file: ".nvmrc" # reads Node version from repo file cache: "npm" - name: Install exact dependencies run: npm ci # fails if package-lock.json is stale - name: Build run: bash scripts/build.sh - name: Upload artifact uses: actions/upload-artifact@65c4c4a1ddee5b72f698fdd19549f0f0e0d8da78 with: name: dist path: dist/ retention-days: 7

Hermetic Builds: Isolating Every Variable

A hermetic build is one that is fully isolated from the ambient environment. Given the same source code and the same inputs, a hermetic build produces bit-for-bit identical output regardless of which machine runs it, what is installed on that machine, or what time of day it is. This is the gold standard. Google's Bazel and Meta's Buck2 are purpose-built hermetic build systems that enforce this at the toolchain level.

You do not need Bazel to achieve most of the benefit. The practical hermetic build checklist:

  • Run inside a container: Build inside a Docker image that contains every tool at a pinned version. The host OS becomes irrelevant.
  • No network during build: All dependencies must be present in the image or in a cache layer before the compile step. A build that fetches from the internet mid-compile is non-hermetic — the internet changes.
  • Eliminate timestamps and random seeds: Many compilers embed build timestamps. Use SOURCE_DATE_EPOCH (a standard environment variable) to freeze the timestamp. Use a fixed random seed in any generation step.
  • Cache inputs, not outputs: Cache the downloaded dependency layer keyed on the lockfile hash. Never cache built artifacts across branches — stale cache is a common source of ghost failures.
Hermetic Build Pipeline Hermetic Build Container (pinned image) Source Code git checkout Dep Cache keyed on lockfile hash Build Step compile + link Test Step unit + integration Artifact binary / image SOURCE_DATE_EPOCH=fixed no timestamps / no RNG Network: DISABLED all deps pre-fetched
A hermetic build container: source code and a pre-fetched dependency cache are the only inputs; network is disabled during compile; a fixed epoch removes timestamp non-determinism.
# Dockerfile — hermetic build image (multi-stage for a Go service) # Every tool is pinned; no package manager runs at compile time FROM golang:1.23.4-bookworm@sha256:7ea4ab8abb4d4b3c0b1e25c4b3e5b8cf5a4a7c8e AS builder WORKDIR /app # Copy lockfile first — layer is cached until go.sum changes COPY go.mod go.sum ./ RUN go mod download -x # download all modules into cache # Copy source code (cache miss only when source changes) COPY . . # Reproducible build: pin timestamp, strip debug info, disable CGO RUN SOURCE_DATE_EPOCH=1700000000 \ CGO_ENABLED=0 \ GOOS=linux GOARCH=amd64 \ go build -trimpath \ -ldflags="-s -w -buildid=" \ -o /bin/api ./cmd/api # ---- Final runtime image (scratch = no shell, no OS, minimal attack surface) ---- FROM scratch COPY --from=builder /bin/api /api COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ ENTRYPOINT ["/api"]

Caching: The Speed Multiplier That Must Not Corrupt

Build caches are essential for pipeline speed — a 3-minute Node.js install becomes 5 seconds when the cache hits. But a corrupt cache can mask real failures for weeks. Apply these rules:

  • Key on inputs, not time: The cache key must include the lockfile hash (hashFiles('**/package-lock.json')), the OS, and the Node version. A cache keyed only on a branch name will serve a stale hit after dependency changes.
  • Restore but never trust fully: After restoring a cache, still run npm ci — the --prefer-offline flag uses the cache but verifies integrity. Never skip the install step just because the cache hit.
  • Separate build cache from test cache: Store compiled output separately from downloaded packages. A bad compile artifact in cache causes ghost failures that are extremely hard to diagnose.
  • Bust aggressively on major changes: Prefix cache keys with a manually bumped version (v2-deps-...). Bump the prefix whenever you suspect cache corruption — one cache bust is faster than hours of investigation.
The "works on CI but not locally" failure mode: This almost always means the CI runner has a cached artifact from a previous build that the developer does not have locally. The fix is not to debug locally — it is to add a --no-cache CI run step and check whether the failure reproduces from a clean state. If it does, the build is not hermetic. If it does not, you have a cache-poisoning problem. Both are critical to fix.

Makefile and Task Runners: The Local-CI Bridge

The highest-leverage pattern for reproducibility is having a single command that does exactly what CI does. A Makefile (or Taskfile.yml for the Go/YAML ecosystem) serves as the standard interface: make build, make test, make lint — same targets, same commands, whether run by a human or a pipeline.

# Makefile — local/CI unified interface .PHONY: deps build test lint docker-build ci REGISTRY := ghcr.io/acme/api IMAGE_TAG := $(shell git rev-parse --short HEAD) deps: npm ci --prefer-offline build: deps bash scripts/build.sh lint: npx eslint src/ --max-warnings 0 npx tsc --noEmit test: deps npx jest --ci --coverage --runInBand docker-build: docker build \ --build-arg SOURCE_DATE_EPOCH=1700000000 \ --tag $(REGISTRY):$(IMAGE_TAG) \ --tag $(REGISTRY):latest \ . # The single target CI runs — identical to what a human runs locally ci: lint test build docker-build @echo "CI complete: $(REGISTRY):$(IMAGE_TAG)"
The two-minute onboarding test: After setting up your build scripts, give your repo to a colleague who has never seen the project. If they can run git clone ... && make ci and get a green build within two minutes, your build is reproducible. If they hit environment-specific errors, those errors will eventually appear on a CI runner too.

Common Failure Modes in Production

Even well-intentioned teams encounter these recurring problems:

  • Floating base image tags: FROM node:lts pointed at Node 18 last year; today it points at Node 22. All your tests now run on a different runtime than production.
  • Implicit system tools: A build script calls jq or yq that happens to be installed on the developer's laptop but is absent on a fresh runner. The build fails in CI with a cryptic "command not found."
  • Timezone-sensitive tests: A test that asserts new Date().toLocaleDateString() === "1/1/2025" passes in UTC-0 and fails in UTC+3. The CI runner uses UTC; developer machines do not.
  • Race conditions in parallel steps: Two build steps write to the same output directory. On a fast runner they collide; on a slow one they do not. The failure is non-deterministic and intermittent.

Each of these is a reproducibility violation. The solution in every case is the same: eliminate the ambient assumption that caused it — pin the image, declare the tool in the Dockerfile, freeze the timezone with TZ=UTC, and serialize conflicting steps.