Artifact Management & Release Engineering

Release Pipelines & Promotion

18 min Lesson 7 of 28

Release Pipelines & Promotion

A release pipeline is the automated path an artifact travels from the moment CI produces it to the moment it serves production traffic. Artifact promotion is the discipline of moving that artifact through a sequence of environments — each with stricter gates — without ever rebuilding it. If you rebuild between dev and prod, you have proven the dev binary, not the prod binary. This is the foundational insight that separates a professional release process from ad-hoc scripting.

The Immutability Principle

An artifact is immutable when its content is fixed at creation time and can never be overwritten. In container terms this means pinning to a digest: sha256:a3f7c9... rather than a mutable tag like :latest or :v1.2.3. In package registry terms it means a registry that refuses to allow a second push to the same version coordinate (Nexus, Artifactory, and AWS ECR all support immutable image tags per repository).

Immutability matters because:

A docker pull myapp:v1.2.3 issued on two different days can silently return different bytes if the tag is mutable. Your staging test and your production deploy are no longer on the same artifact.
Post-incident forensics require you to recover the exact binary. A mutable tag may have been overwritten.
Supply-chain attack surface: a mutable tag allows a compromised registry push to silently upgrade a running fleet on next pod restart.

Digest pinning in Kubernetes: Never deploy image: myapp:v1.2.3 in production manifests. Always deploy image: 123456789.dkr.ecr.us-east-1.amazonaws.com/myapp@sha256:a3f7c9.... Tools like crane digest or docker inspect --format '{{.RepoDigests}}' resolve a tag to its current digest so your GitOps commit captures an immutable reference.

Pipeline Stages & Promotion Gates

A canonical big-tech release pipeline has four named environments. Each environment is a promotion gate: automated quality signals that must pass before the artifact advances. The artifact is built exactly once (in CI) and then promoted by updating which version each environment's configuration declares.

Artifact promotion pipeline: one build, four environments, escalating human and automated gates.

Repos-per-Tier: Dev, Release Candidate, and Release

Mature organisations split their artifact repository into logical tiers. In Nexus or Artifactory this is a repository group strategy; in ECR it is separate repositories with different lifecycle policies. A typical three-tier layout:

dev / snapshots — every CI build lands here automatically. Retention is short (7 days). No human approval. Artifacts here are candidates only.
rc (release-candidate) — promoted by CI after integration tests pass. Human review is optional here but security scans are required. Retention is 30 days.
release — promoted only after staging sign-off and explicit human approval. Retention is indefinite (or policy-governed). This is the only repo production is allowed to pull from.

The promotion between repos is not a rebuild — it is a copy. In Artifactory this is jf rt copy; in ECR it is aws ecr batch-get-image + aws ecr put-image with the same manifest. The immutable digest is preserved end-to-end.

# Promote an ECR image from the dev repo to the release repo without rebuilding
# Both repos are in the same account; the digest is preserved.

DEV_REPO="123456789012.dkr.ecr.us-east-1.amazonaws.com/myapp-dev"
REL_REPO="123456789012.dkr.ecr.us-east-1.amazonaws.com/myapp-release"
DIGEST="sha256:a3f7c9d2e811b..."

MANIFEST=$(aws ecr batch-get-image \
  --repository-name myapp-dev \
  --image-ids imageDigest=${DIGEST} \
  --query 'images[0].imageManifest' \
  --output text)

aws ecr put-image \
  --repository-name myapp-release \
  --image-tag v1.4.0 \
  --image-manifest "${MANIFEST}"

# Verify the digest is identical in the destination
aws ecr describe-images \
  --repository-name myapp-release \
  --image-ids imageTag=v1.4.0 \
  --query 'imageDetails[0].imageDigest'

Automating Promotion with a Promotion Script

The promotion step is triggered by a pipeline job that runs only when the previous stage's gate passes. In GitHub Actions this looks like a workflow with a needs chain and an environment declaration (which wires in repository protection rules for human approval):

# .github/workflows/promote-staging.yml
name: Promote to Staging

on:
  workflow_dispatch:
    inputs:
      image_digest:
        description: "Image digest from dev (sha256:...)"
        required: true

jobs:
  integration-gate:
    runs-on: ubuntu-latest
    steps:
      - name: Run integration test suite against dev image
        run: |
          docker pull 123456789012.dkr.ecr.us-east-1.amazonaws.com/myapp-dev@${{ github.event.inputs.image_digest }}
          ./scripts/run-integration-tests.sh ${{ github.event.inputs.image_digest }}

  promote:
    needs: integration-gate
    runs-on: ubuntu-latest
    environment: staging        # enforces required reviewers configured in GitHub repo settings
    permissions:
      id-token: write           # OIDC — no long-lived AWS keys
      contents: read
    steps:
      - name: Configure AWS via OIDC
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/github-staging-promoter
          aws-region: us-east-1

      - name: Copy image to staging repo
        run: |
          DIGEST="${{ github.event.inputs.image_digest }}"
          MANIFEST=$(aws ecr batch-get-image \
            --repository-name myapp-dev \
            --image-ids imageDigest=${DIGEST} \
            --query 'images[0].imageManifest' --output text)
          aws ecr put-image \
            --repository-name myapp-staging \
            --image-tag staging-$(date +%Y%m%d%H%M%S) \
            --image-manifest "${MANIFEST}"

      - name: Update GitOps overlay
        run: |
          git clone https://x-access-token:${{ secrets.GITOPS_TOKEN }}@github.com/myorg/infra-config.git
          cd infra-config
          yq e ".spec.template.spec.containers[0].image = \"123456789012.dkr.ecr.us-east-1.amazonaws.com/myapp-staging@${DIGEST}\"" \
            -i apps/myapp/overlays/staging/deployment-patch.yaml
          git commit -am "promote myapp to staging @ ${DIGEST}"
          git push

Never pass long-lived credentials between promotion stages. Use OIDC (GitHub Actions OIDC → AWS IAM role) scoped to minimum privilege per environment. The staging-promoter IAM role should have ecr:PutImage on the staging repo only — not the release repo. The production-promoter is a separate role requiring an additional approval and is audited separately in CloudTrail.

Production Failure Modes to Design Against

Real promotion pipelines fail in predictable ways. Designing your pipeline to catch these before they reach production is the engineering work:

Digest drift: the GitOps overlay in staging was committed with one digest, but someone manually patched the production deployment to a different image. Solution: reconciliation jobs (Argo CD diff alerts, or a nightly CI job that asserts the deployed digest equals the GitOps-declared digest for every environment).
Promotion bypass: an engineer with direct kubectl access deploys a hotfix image to production without going through the pipeline. Solution: admission webhooks that reject images not sourced from the myapp-release ECR repo, combined with break-glass logging.
Gate flapping: integration tests are flaky, so the team disables the gate temporarily. A broken build promotes. Solution: treat flaky tests as P1 bugs. Never skip gates; instead add a circuit-breaker that blocks promotion if the flake rate exceeds a threshold, forcing the team to fix the tests first.
Config/artifact mismatch: the artifact is correctly promoted but the Kubernetes ConfigMap for the new feature flag was not updated in the production overlay. The app starts and immediately errors. Solution: atomic promotion commits that update both the image digest and any associated config in a single GitOps PR, reviewed together.

The most dangerous pipeline pattern: a CI job that has write access to the production GitOps overlay and can self-approve its own PR. This means any compromise of the CI system (a malicious dependency, a leaked token) results in direct production deployment without human review. Production overlays must always require a human approval that cannot be bypassed by the bot that created the PR.

Measuring Pipeline Health

DORA metrics directly reflect pipeline quality. Track these per service:

Deployment Frequency: how often a build reaches production. Blocked promotions show as gaps.
Lead Time for Changes: time from commit to production. Long lead times often mean manual gates that could be automated, or slow integration test suites.
Change Failure Rate: percentage of deployments that require a hotfix or rollback. A high rate means your pre-production gates are not catching real failures.
Mean Time to Restore: how long after a bad deploy the service is restored. Directly improved by fast auto-rollback on SLO breach.

Elite performers (per the DORA State of DevOps report) deploy multiple times per day with a change failure rate below 5% and MTTR under one hour. Every design decision in your promotion pipeline — immutability, digest pinning, atomic config+image commits, separate repos per tier — is a lever that moves these numbers.