Release Pipelines & Promotion
Release Pipelines & Promotion
A release pipeline is the automated path an artifact travels from the moment CI produces it to the moment it serves production traffic. Artifact promotion is the discipline of moving that artifact through a sequence of environments — each with stricter gates — without ever rebuilding it. If you rebuild between dev and prod, you have proven the dev binary, not the prod binary. This is the foundational insight that separates a professional release process from ad-hoc scripting.
The Immutability Principle
An artifact is immutable when its content is fixed at creation time and can never be overwritten. In container terms this means pinning to a digest: sha256:a3f7c9... rather than a mutable tag like :latest or :v1.2.3. In package registry terms it means a registry that refuses to allow a second push to the same version coordinate (Nexus, Artifactory, and AWS ECR all support immutable image tags per repository).
Immutability matters because:
- A
docker pull myapp:v1.2.3issued on two different days can silently return different bytes if the tag is mutable. Your staging test and your production deploy are no longer on the same artifact. - Post-incident forensics require you to recover the exact binary. A mutable tag may have been overwritten.
- Supply-chain attack surface: a mutable tag allows a compromised registry push to silently upgrade a running fleet on next pod restart.
image: myapp:v1.2.3 in production manifests. Always deploy image: 123456789.dkr.ecr.us-east-1.amazonaws.com/myapp@sha256:a3f7c9.... Tools like crane digest or docker inspect --format '{{.RepoDigests}}' resolve a tag to its current digest so your GitOps commit captures an immutable reference.
Pipeline Stages & Promotion Gates
A canonical big-tech release pipeline has four named environments. Each environment is a promotion gate: automated quality signals that must pass before the artifact advances. The artifact is built exactly once (in CI) and then promoted by updating which version each environment's configuration declares.
Repos-per-Tier: Dev, Release Candidate, and Release
Mature organisations split their artifact repository into logical tiers. In Nexus or Artifactory this is a repository group strategy; in ECR it is separate repositories with different lifecycle policies. A typical three-tier layout:
- dev / snapshots — every CI build lands here automatically. Retention is short (7 days). No human approval. Artifacts here are candidates only.
- rc (release-candidate) — promoted by CI after integration tests pass. Human review is optional here but security scans are required. Retention is 30 days.
- release — promoted only after staging sign-off and explicit human approval. Retention is indefinite (or policy-governed). This is the only repo production is allowed to pull from.
The promotion between repos is not a rebuild — it is a copy. In Artifactory this is jf rt copy; in ECR it is aws ecr batch-get-image + aws ecr put-image with the same manifest. The immutable digest is preserved end-to-end.
Automating Promotion with a Promotion Script
The promotion step is triggered by a pipeline job that runs only when the previous stage's gate passes. In GitHub Actions this looks like a workflow with a needs chain and an environment declaration (which wires in repository protection rules for human approval):
ecr:PutImage on the staging repo only — not the release repo. The production-promoter is a separate role requiring an additional approval and is audited separately in CloudTrail.
Production Failure Modes to Design Against
Real promotion pipelines fail in predictable ways. Designing your pipeline to catch these before they reach production is the engineering work:
- Digest drift: the GitOps overlay in staging was committed with one digest, but someone manually patched the production deployment to a different image. Solution: reconciliation jobs (Argo CD diff alerts, or a nightly CI job that asserts the deployed digest equals the GitOps-declared digest for every environment).
- Promotion bypass: an engineer with direct
kubectlaccess deploys a hotfix image to production without going through the pipeline. Solution: admission webhooks that reject images not sourced from themyapp-releaseECR repo, combined with break-glass logging. - Gate flapping: integration tests are flaky, so the team disables the gate temporarily. A broken build promotes. Solution: treat flaky tests as P1 bugs. Never skip gates; instead add a circuit-breaker that blocks promotion if the flake rate exceeds a threshold, forcing the team to fix the tests first.
- Config/artifact mismatch: the artifact is correctly promoted but the Kubernetes ConfigMap for the new feature flag was not updated in the production overlay. The app starts and immediately errors. Solution: atomic promotion commits that update both the image digest and any associated config in a single GitOps PR, reviewed together.
Measuring Pipeline Health
DORA metrics directly reflect pipeline quality. Track these per service:
- Deployment Frequency: how often a build reaches production. Blocked promotions show as gaps.
- Lead Time for Changes: time from commit to production. Long lead times often mean manual gates that could be automated, or slow integration test suites.
- Change Failure Rate: percentage of deployments that require a hotfix or rollback. A high rate means your pre-production gates are not catching real failures.
- Mean Time to Restore: how long after a bad deploy the service is restored. Directly improved by fast auto-rollback on SLO breach.
Elite performers (per the DORA State of DevOps report) deploy multiple times per day with a change failure rate below 5% and MTTR under one hour. Every design decision in your promotion pipeline — immutability, digest pinning, atomic config+image commits, separate repos per tier — is a lever that moves these numbers.