Every lesson in this tutorial has been a building block. Now you assemble them into a production-grade CI/CD pipeline — the kind that ships code at companies like Shopify, Stripe, and GitHub itself. This lesson walks you through designing, writing, and operating a complete workflow that builds, tests, packages, and deploys a containerised Node.js API to a cloud environment, gate-keeping each stage with automated quality checks and approval controls.
The Target Architecture
The app is a REST API packaged as a Docker image, pushed to a container registry, and deployed to a Kubernetes cluster. The pipeline enforces this progression: code must pass static analysis and unit tests before an image is built; the image must be scanned for CVEs before it is pushed; a staging deployment must succeed and a smoke-test must pass before production is unlocked; and production requires a named approver.
End-to-end CI/CD pipeline: every stage is a gated job; production requires a manual approval.
The Complete Workflow File
All five stages live in a single workflow file. The needs key enforces the dependency chain; the environment: production block adds the approval gate. Notice that the IMAGE_TAG is derived from the Git SHA — immutable, traceable, impossible to accidentally overwrite.
The pipeline is only as good as the Dockerfile it builds. Use a multi-stage build to keep the final image lean, and never run as root in production. The --chown and USER node directives are non-negotiable at big-tech companies.
# Dockerfile
# Stage 1 — install dependencies (layer cache-friendly)
FROM node:20-alpine AS deps
WORKDIR /app
COPY package*.json ./
RUN npm ci --omit=dev
# Stage 2 — build (only if your app has a compile step)
FROM node:20-alpine AS builder
WORKDIR /app
COPY --from=deps /app/node_modules ./node_modules
COPY . .
RUN npm run build
# Stage 3 — minimal production image
FROM node:20-alpine AS runner
WORKDIR /app
# Security: non-root user
RUN addgroup --system --gid 1001 nodejs \
&& adduser --system --uid 1001 nodeuser
COPY --from=builder --chown=nodeuser:nodejs /app/dist ./dist
COPY --from=deps --chown=nodeuser:nodejs /app/node_modules ./node_modules
USER nodeuser
EXPOSE 3000
HEALTHCHECK --interval=15s --timeout=5s --start-period=10s --retries=3 \
CMD wget -qO- http://localhost:3000/healthz || exit 1
CMD ["node", "dist/server.js"]
Key Design Decisions Explained
Image identity via digest, not tag
The build-image job surfaces the image digest (a SHA-256 content hash) as an output. Every downstream job references the image by that digest, not by a mutable tag like :latest. This guarantees that the exact binary deployed to staging is the exact binary that gets to production — no tag-overwrite races, no "works on my machine" drift.
CVE scanning as a hard gate
Trivy runs with exit-code: 1, which means any CRITICAL or HIGH CVE will fail the scan job and prevent both the staging and production deployments from starting. The SARIF results are uploaded to GitHub's Security tab so engineers can triage without leaving GitHub.
The approval environment
The environment: production declaration links to a GitHub Environment configured with Required Reviewers. When the workflow reaches that job it pauses and GitHub sends a notification to the listed reviewers. No code changes are needed to add or remove approvers — it is all managed in the repository Settings UI and is fully audited.
The if condition on deploy-production is critical. Without if: github.event_name == 'release', every merge to main would queue a production deployment waiting for approval. Only a published GitHub Release should unlock the production job. Staging deploys on every main merge; production deploys only on a release event.
Rollback Strategy
Every image pushed is immutable and tagged by SHA. Rolling back is a one-liner: find the previous successful run in the Actions UI, copy its digest, re-run the deploy-production job with that value, or simply:
# Rollback: set the deployment image back to the previous known-good digest
kubectl set image deployment/api \
api=ghcr.io/your-org/your-repo@sha256:<previous-digest> \
-n production
kubectl rollout status deployment/api -n production --timeout=300s
# Verify
kubectl get pods -n production -l app=api -o wide
Pin Actions to a commit SHA in production pipelines. Using actions/checkout@v4 is convenient but actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 is immutable. A compromised action tag can be updated by an attacker without changing the version string; a pinned SHA cannot. Tools like Dependabot and Renovate keep pins up to date automatically.
Common Production Failure Modes
Deployment timeout before Pods are healthy — your readiness probe is failing; check the app logs with kubectl logs -n staging -l app=api --since=2m before blaming the pipeline.
Smoke test flaps — the retry loop in the example handles transient load-balancer warm-up; adjust the sleep interval for your infrastructure cold-start time.
Trivy blocks on a false positive — use .trivyignore to suppress specific CVE IDs with a comment explaining the decision and a review-by date.
GITHUB_TOKEN lacks packages: write — the permission must be declared at the job level, not just assumed. It was added explicitly in build-image.
Stale kubeconfig — rotate STAGING_KUBECONFIG and PROD_KUBECONFIG secrets when service-account tokens expire. Set a calendar reminder or use OIDC federation instead (covered in lesson 8).
Never put cluster credentials in workflow environment variables visible in logs. Always decode from a base64 secret directly to a file (echo "$SECRET" | base64 -d > ~/.kube/config) and set strict permissions (chmod 600). GitHub masks secret values in logs, but an explicit echo $SECRET will still partially leak in some shells. The pattern in this lesson is the safe approach.
What to Extend Next
This pipeline is a solid foundation. In a real company codebase you would add: integration and end-to-end tests as parallel jobs between test and build-image; database migration jobs with a dry-run gate; Slack / PagerDuty notifications on failure using if: failure() steps; and DORA metric emission (deployment frequency, lead time) to your observability platform. The architecture scales because every concern is its own job — adding a new gate is a matter of inserting a job with the right needs chain.