Artifacts & Build Outputs
Artifacts & Build Outputs
Every CI pipeline produces outputs: compiled binaries, container images, test reports, coverage files, signed packages. These outputs are called artifacts. Treating artifacts as first-class citizens — with deterministic versioning, secure storage, and reliable inter-stage handoff — is what separates a hobbyist pipeline from a production-grade system at big-tech scale.
What Counts as an Artifact?
An artifact is any file produced by one stage of a pipeline that is either (a) consumed by a later stage, or (b) published for external use. Common examples include:
- Compiled binaries — Go, Rust, Java JARs/WARs, .NET DLLs
- Container images — OCI-format layers pushed to a registry
- Language packages — npm tarballs, Python wheels, Ruby gems, Maven artifacts
- Static sites — the output of
next build,hugo,vite build - Test & coverage reports — JUnit XML, lcov HTML, SARIF files
- Infrastructure plans —
terraform planJSON saved before apply - Release archives — tarballs or ZIPs attached to a GitHub Release
Artifact Storage: Where and Why It Matters
Artifacts must be stored outside the ephemeral runner. When a runner VM is recycled — or when a parallel job on a different machine needs the artifact — it must be retrievable from a stable, authenticated location. The main tiers are:
- CI-native artifact stores — GitHub Actions Artifacts (backed by Azure Blob), GitLab Job Artifacts (S3-compatible). Zero-config but have size limits (GitHub: 500 MB per artifact, 10 GB per repo by default) and short default retention (90 days).
- Package registries — GitHub Packages, GitLab Package Registry, JFrog Artifactory, Sonatype Nexus. The right home for versioned, publishable artifacts (JAR, npm, Docker image). They enforce immutability by convention (you cannot overwrite a published
1.2.3). - Object storage — AWS S3, GCS, Azure Blob. Used by large teams for everything that does not fit a package registry: Terraform plans, large ML model checkpoints, browser test videos.
v2.4.1), it must never be overwritten. Any change — even a single byte — must produce a new version. This is why package registries reject re-uploads of the same version by default: a mutable artifact makes the entire supply chain untrustworthy.
Deterministic Versioning Schemes
Artifact versions must be unique, traceable back to a commit, and sortable. The three patterns used in production are:
- SemVer from git tag —
v2.4.1triggered by agit tag v2.4.1. Standard for public packages. Tools:git describe --tags,semantic-release. - Commit SHA suffix —
2.4.1-abc1234. Every merge tomainproduces an artifact. The SHA makes the version traceable without a tag. Used heavily for internal services. - CalVer + build number —
2025.06.1042. Common in monorepos and mobile releases (App Store requires numeric build numbers). The build number is the CI run ID.
In GitHub Actions the short SHA is available as ${{ github.sha }} (full 40 chars) — slice it in bash: SHA=$(echo $GITHUB_SHA | head -c8).
Passing Artifacts Between Stages (GitHub Actions)
Runners are isolated VMs. Files written to disk by one job are gone when the job ends. The canonical pattern: upload at the end of the build job, download at the start of every job that needs it. GitHub Actions provides actions/upload-artifact and actions/download-artifact for this.
retention-days. The default is 90 days. At Google and Meta scale, unset retention exhausts storage quota within weeks because every PR produces artifacts. Set short TTLs (3-7 days) for intermediate build artifacts and longer TTLs (90-365 days) only for release artifacts that may need investigation months later.
Publishing to a Package Registry
Container images are the most common production artifact. The publish step authenticates to a registry, tags the image with both a commit-specific tag and latest (for convenience), then pushes both. Using Docker BuildKit and layer caching dramatically speeds this up on repeat builds.
Build Reproducibility
A build is reproducible when the same source commit always produces byte-for-identical artifacts. This matters for security (you can verify a binary matches its source) and debugging (you can rebuild a six-month-old release to investigate a CVE). Achieving it requires:
- Lock files committed (
go.sum,package-lock.json,Pipfile.lock,Gemfile.lock) — never resolve dependencies at runtime in CI. - Pinned base images — use a digest (
FROM node:20@sha256:abc...) not a mutable tag (FROM node:latest). - Deterministic timestamps — set
SOURCE_DATE_EPOCH(Unix timestamp of the last commit) so compressors and archivers do not embed wall-clock time. - Fixed tool versions — specify
go-version: '1.23.4',node-version: '20.11.0'exactly, not'20'.
myapp:latest and someone pushes a broken image with the same tag, every subsequent deployment pulls the broken image. Always deploy by digest (myapp@sha256:...) or by the immutable commit-SHA tag, and treat latest as a convenience alias only for local development.
Artifact Signing and Attestation
At companies like Google, every build artifact is signed with a cryptographic key (SLSA provenance). This lets downstream consumers verify that an artifact was produced by a specific pipeline run from a specific commit — not injected by an attacker who compromised the registry. GitHub Actions supports this natively via actions/attest-build-provenance, which issues an SLSA Level 3 attestation stored in the GitHub trust root. This is increasingly required by enterprise compliance (NIST SSDF, EU Cyber Resilience Act).
Common Failure Modes
- Artifact name collision — two parallel jobs upload an artifact with the same name; the second silently overwrites the first. Always include the job matrix variable or branch name in the artifact name.
- Missing
needsdependency — a downstream job starts before the artifact is uploaded, gets a 404, and fails non-deterministically. Always declare explicitneeds. - Uploading the entire repo — a misconfigured
path: .uploads gigabytes and inflates storage costs. Be explicit: upload only thedist/orbuild/directory. - No retention policy — artifact storage bills compound. Automate cleanup with retention policies or a nightly purge job.