GitHub Actions in Depth

Project: A Complete CI/CD Workflow

18 min Lesson 10 of 30

Project: A Complete CI/CD Workflow

Every lesson in this tutorial has been a building block. Now you assemble them into a production-grade CI/CD pipeline — the kind that ships code at companies like Shopify, Stripe, and GitHub itself. This lesson walks you through designing, writing, and operating a complete workflow that builds, tests, packages, and deploys a containerised Node.js API to a cloud environment, gate-keeping each stage with automated quality checks and approval controls.

The Target Architecture

The app is a REST API packaged as a Docker image, pushed to a container registry, and deployed to a Kubernetes cluster. The pipeline enforces this progression: code must pass static analysis and unit tests before an image is built; the image must be scanned for CVEs before it is pushed; a staging deployment must succeed and a smoke-test must pass before production is unlocked; and production requires a named approver.

End-to-end CI/CD pipeline: every stage is a gated job; production requires a manual approval.

The Complete Workflow File

All five stages live in a single workflow file. The needs key enforces the dependency chain; the environment: production block adds the approval gate. Notice that the IMAGE_TAG is derived from the Git SHA — immutable, traceable, impossible to accidentally overwrite.

# .github/workflows/cicd.yml
name: CI/CD Pipeline

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]
  release:
    types: [published]

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}

jobs:
  # ── 1. LINT & TEST ──────────────────────────────────────────────
  test:
    name: Lint & Unit Tests
    runs-on: ubuntu-24.04
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-node@v4
        with:
          node-version: "20"
          cache: "npm"

      - run: npm ci

      - name: Lint
        run: npm run lint

      - name: Unit tests with coverage
        run: npm test -- --coverage --coverageThreshold='{"global":{"lines":80}}'

      - uses: actions/upload-artifact@v4
        if: always()
        with:
          name: coverage-report
          path: coverage/
          retention-days: 7

  # ── 2. BUILD IMAGE ────────────────────────────────────────────────
  build-image:
    name: Build Docker Image
    runs-on: ubuntu-24.04
    needs: test
    outputs:
      image-digest: ${{ steps.build.outputs.digest }}
      image-tag: ${{ steps.meta.outputs.tags }}
    permissions:
      contents: read
      packages: write
    steps:
      - uses: actions/checkout@v4

      - uses: docker/setup-buildx-action@v3

      - uses: docker/login-action@v3
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Extract metadata
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
          tags: |
            type=sha,prefix=sha-,format=short
            type=ref,event=branch
            type=semver,pattern={{version}}

      - name: Build & push (cache-optimised)
        id: build
        uses: docker/build-push-action@v5
        with:
          context: .
          push: ${{ github.event_name != 'pull_request' }}
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          cache-from: type=gha
          cache-to: type=gha,mode=max
          provenance: true        # SLSA level-1 attestation
          sbom: true              # Software Bill of Materials

  # ── 3. SCAN & SIGN ──────────────────────────────────────────────
  scan:
    name: CVE Scan
    runs-on: ubuntu-24.04
    needs: build-image
    permissions:
      contents: read
      packages: read
      security-events: write
    steps:
      - name: Run Trivy vulnerability scanner
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}@${{ needs.build-image.outputs.image-digest }}
          format: sarif
          output: trivy-results.sarif
          severity: CRITICAL,HIGH
          exit-code: "1"          # fail the job on any CRITICAL or HIGH

      - uses: github/codeql-action/upload-sarif@v3
        if: always()
        with:
          sarif_file: trivy-results.sarif

  # ── 4. DEPLOY STAGING ────────────────────────────────────────────
  deploy-staging:
    name: Deploy to Staging
    runs-on: ubuntu-24.04
    needs: scan
    environment: staging
    if: github.ref == 'refs/heads/main'
    steps:
      - uses: actions/checkout@v4

      - name: Install kubectl
        uses: azure/setup-kubectl@v4
        with:
          version: "v1.30.0"

      - name: Authenticate to cluster
        run: |
          mkdir -p ~/.kube
          echo "${{ secrets.STAGING_KUBECONFIG }}" | base64 -d > ~/.kube/config
          chmod 600 ~/.kube/config

      - name: Rolling update
        run: |
          IMAGE="${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}@${{ needs.build-image.outputs.image-digest }}"
          kubectl set image deployment/api api="$IMAGE" -n staging
          kubectl rollout status deployment/api -n staging --timeout=120s

      - name: Smoke test
        run: |
          STAGING_URL="https://staging.api.example.com"
          for i in 1 2 3 4 5; do
            STATUS=$(curl -s -o /dev/null -w "%{http_code}" "$STAGING_URL/healthz")
            [ "$STATUS" = "200" ] && echo "Smoke test passed" && exit 0
            echo "Attempt $i: got $STATUS, retrying in 10s..."
            sleep 10
          done
          echo "Smoke test failed after 5 attempts" && exit 1

  # ── 5. DEPLOY PRODUCTION ─────────────────────────────────────────
  deploy-production:
    name: Deploy to Production
    runs-on: ubuntu-24.04
    needs: deploy-staging
    environment: production         # <-- manual approval gate
    if: github.event_name == 'release'
    steps:
      - uses: actions/checkout@v4

      - uses: azure/setup-kubectl@v4
        with:
          version: "v1.30.0"

      - name: Authenticate to production cluster
        run: |
          mkdir -p ~/.kube
          echo "${{ secrets.PROD_KUBECONFIG }}" | base64 -d > ~/.kube/config
          chmod 600 ~/.kube/config

      - name: Blue/green cutover
        run: |
          IMAGE="${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}@${{ needs.build-image.outputs.image-digest }}"
          kubectl set image deployment/api api="$IMAGE" -n production
          kubectl rollout status deployment/api -n production --timeout=300s

      - name: Tag deployment in Datadog
        env:
          DD_API_KEY: ${{ secrets.DATADOG_API_KEY }}
        run: |
          curl -s -X POST "https://api.datadoghq.com/api/v1/events" \
            -H "DD-API-KEY: $DD_API_KEY" \
            -H "Content-Type: application/json" \
            -d "{\"title\":\"Deployed ${{ github.ref_name }}\",\"text\":\"SHA ${{ github.sha }}\",\"tags\":[\"env:production\"]}"

The Dockerfile That Makes It Work

The pipeline is only as good as the Dockerfile it builds. Use a multi-stage build to keep the final image lean, and never run as root in production. The --chown and USER node directives are non-negotiable at big-tech companies.

# Dockerfile
# Stage 1 — install dependencies (layer cache-friendly)
FROM node:20-alpine AS deps
WORKDIR /app
COPY package*.json ./
RUN npm ci --omit=dev

# Stage 2 — build (only if your app has a compile step)
FROM node:20-alpine AS builder
WORKDIR /app
COPY --from=deps /app/node_modules ./node_modules
COPY . .
RUN npm run build

# Stage 3 — minimal production image
FROM node:20-alpine AS runner
WORKDIR /app

# Security: non-root user
RUN addgroup --system --gid 1001 nodejs \
 && adduser  --system --uid 1001 nodeuser

COPY --from=builder --chown=nodeuser:nodejs /app/dist ./dist
COPY --from=deps    --chown=nodeuser:nodejs /app/node_modules ./node_modules

USER nodeuser
EXPOSE 3000

HEALTHCHECK --interval=15s --timeout=5s --start-period=10s --retries=3 \
  CMD wget -qO- http://localhost:3000/healthz || exit 1

CMD ["node", "dist/server.js"]

Key Design Decisions Explained

Image identity via digest, not tag

The build-image job surfaces the image digest (a SHA-256 content hash) as an output. Every downstream job references the image by that digest, not by a mutable tag like :latest. This guarantees that the exact binary deployed to staging is the exact binary that gets to production — no tag-overwrite races, no "works on my machine" drift.

CVE scanning as a hard gate

Trivy runs with exit-code: 1, which means any CRITICAL or HIGH CVE will fail the scan job and prevent both the staging and production deployments from starting. The SARIF results are uploaded to GitHub's Security tab so engineers can triage without leaving GitHub.

The approval environment

The environment: production declaration links to a GitHub Environment configured with Required Reviewers. When the workflow reaches that job it pauses and GitHub sends a notification to the listed reviewers. No code changes are needed to add or remove approvers — it is all managed in the repository Settings UI and is fully audited.

The if condition on deploy-production is critical. Without if: github.event_name == 'release', every merge to main would queue a production deployment waiting for approval. Only a published GitHub Release should unlock the production job. Staging deploys on every main merge; production deploys only on a release event.

Rollback Strategy

Every image pushed is immutable and tagged by SHA. Rolling back is a one-liner: find the previous successful run in the Actions UI, copy its digest, re-run the deploy-production job with that value, or simply:

# Rollback: set the deployment image back to the previous known-good digest
kubectl set image deployment/api \
  api=ghcr.io/your-org/your-repo@sha256:<previous-digest> \
  -n production

kubectl rollout status deployment/api -n production --timeout=300s

# Verify
kubectl get pods -n production -l app=api -o wide

Pin Actions to a commit SHA in production pipelines. Using actions/checkout@v4 is convenient but actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 is immutable. A compromised action tag can be updated by an attacker without changing the version string; a pinned SHA cannot. Tools like Dependabot and Renovate keep pins up to date automatically.

Common Production Failure Modes

Deployment timeout before Pods are healthy — your readiness probe is failing; check the app logs with kubectl logs -n staging -l app=api --since=2m before blaming the pipeline.
Smoke test flaps — the retry loop in the example handles transient load-balancer warm-up; adjust the sleep interval for your infrastructure cold-start time.
Trivy blocks on a false positive — use .trivyignore to suppress specific CVE IDs with a comment explaining the decision and a review-by date.
GITHUB_TOKEN lacks packages: write — the permission must be declared at the job level, not just assumed. It was added explicitly in build-image.
Stale kubeconfig — rotate STAGING_KUBECONFIG and PROD_KUBECONFIG secrets when service-account tokens expire. Set a calendar reminder or use OIDC federation instead (covered in lesson 8).

Never put cluster credentials in workflow environment variables visible in logs. Always decode from a base64 secret directly to a file (echo "$SECRET" | base64 -d > ~/.kube/config) and set strict permissions (chmod 600). GitHub masks secret values in logs, but an explicit echo $SECRET will still partially leak in some shells. The pattern in this lesson is the safe approach.

What to Extend Next

This pipeline is a solid foundation. In a real company codebase you would add: integration and end-to-end tests as parallel jobs between test and build-image; database migration jobs with a dry-run gate; Slack / PagerDuty notifications on failure using if: failure() steps; and DORA metric emission (deployment frequency, lead time) to your observability platform. The architecture scales because every concern is its own job — adding a new gate is a matter of inserting a job with the right needs chain.