Reproducible & Hermetic Builds
Reproducible & Hermetic Builds
At big-tech scale, "it works on my machine" is not a defense — it is an incident report waiting to happen. A reproducible build is one where the same source inputs always produce bit-for-bit (or semantically equivalent) outputs. A hermetic build goes further: the build process is sealed from the outside world — no downloading from the internet at build time, no ambient environment variables, no host filesystem leakage. Together these two properties are the foundation of secure, auditable software supply chains.
Google, Meta, and Uber all run hermetic build systems (Bazel, Buck) precisely because at their scale a non-deterministic build is a security and reliability liability. The SLSA framework (Supply-chain Levels for Software Artifacts) formalises these requirements into leveled compliance tiers that regulators and enterprise customers now demand.
Why Reproducibility Is Hard
Most build systems are not reproducible by default. Common sources of non-determinism include:
- Timestamps — file modification times,
__DATE__/__TIME__macros embedded in binaries. - Filesystem ordering — directory listings are not sorted on most filesystems; build tools that iterate them produce different archive member orders.
- Floating-point and CPU variation — compiler auto-vectorisation can produce different machine code across CPU generations.
- Random UUIDs or salts embedded into build outputs (some bundlers do this).
- Unpinned dependencies — a
pip install requeststoday may fetch 2.31.0; tomorrow it fetches 2.32.0. - Network fetches at build time — resolving DNS at build time introduces remote-state dependency.
Dependency Pinning and Lockfiles
The first and most impactful control is pinning every dependency to an exact version via a lockfile. Every major ecosystem has a lockfile mechanism. Commit the lockfile to git and treat divergence as a failing CI check.
package-lock.json or Pipfile.lock to .gitignore by default. This is correct for reusable libraries (where you want to test across a range of dep versions) but wrong for deployed applications. Every application that runs in production must have its exact dependency graph locked and committed.
Hash-Verified Downloads
A version pin is not enough — a package registry can be compromised, or a version tag can be moved (mutable tags are a real attack vector). The fix is to pin content hashes (SHA-256) alongside version strings.
Build Metadata and Provenance
Reproducibility answers "can I rebuild this?" Provenance answers "who built this, from what source, on what machine, at what time?" Both are required for a mature supply chain.
Build metadata is structured information stamped into artifacts at build time:
git.commit— the exact commit SHA that produced this build.git.branch/git.tag— branch or semver tag.build.timestamp— RFC-3339 UTC timestamp (store as metadata, not embedded in the binary, to preserve reproducibility).build.runner— CI system, runner ID, pipeline URL.build.builder_image— the exact Docker image digest used to compile.
SLSA Provenance Attestations
SLSA (Supply-chain Levels for Software Artifacts, pronounced "salsa") defines four trust levels. SLSA 2 — achievable by any team with a modern CI system — requires a signed provenance attestation: a machine-verifiable document stating what source produced this artifact, on what build platform, with what inputs.
GitHub Actions ships a first-class SLSA 3 generator. Consumers verify the attestation before installing.
Hermetic Builds in Practice
True hermeticity means the build sandbox has no outbound network access and uses only pre-fetched, hash-verified inputs. Bazel enforces this via sandboxing. For teams not on Bazel, approximate it:
- Pre-fetch all dependencies into a registry mirror or vendor directory before the build step begins.
- Use
--network=nonein Docker build stages that compile code (a separate fetch stage downloads deps). - Run builds inside a pinned builder image (e.g.,
golang:1.22.3@sha256:...) so the compiler version is fixed. - Set
SOURCE_DATE_EPOCHto the git commit timestamp to eliminate timestamp non-determinism in archive tools.
-trimpath flag (Go) strips absolute host filesystem paths from the binary, eliminating a major source of non-determinism between developers on different machines. Most languages have an equivalent — Rust uses --remap-path-prefix; Python wheel builds accept SOURCE_DATE_EPOCH; npm packages set reproducible: true in some bundlers.
Verifying Reproducibility
To confirm a build is truly reproducible, rebuild from the same inputs on a different machine and compare artifact hashes. The Reproducible Builds project publishes tools for this:
Reproducible and hermetic builds are not optional niceties at production scale — they are the foundation on which software supply chain security, binary transparency, and efficient caching are built. SLSA compliance increasingly appears in enterprise procurement requirements and government security mandates (NIST SP 800-218, EO 14028). Teams that invest in these practices early avoid painful retrofits when auditors arrive.