DevSecOps & Supply Chain Security

Container & IaC Scanning

18 min Lesson 6 of 28

Container & IaC Scanning

A Dockerfile is not just a build script — it is a security boundary specification. Every FROM line inherits an attack surface. Every RUN apt-get install pins a package version that will drift into CVE territory the day you stop rebuilding. Infrastructure-as-Code files (Terraform, Kubernetes manifests, Helm charts, CloudFormation) carry a parallel risk: a misconfigured privileged: true field or a publicly-exposed S3 bucket defined in HCL is just as dangerous as a vulnerable library. This lesson teaches you to catch both classes of defect in CI — before the image or the resource ever reaches a real environment.

Image CVE Scanning: How It Actually Works

Container image scanners operate by extracting the software inventory from a layered filesystem — OS packages from /var/lib/dpkg/status or /var/lib/rpm, language packages from package-lock.json, go.sum, Pipfile.lock, and so on — and matching that inventory against vulnerability databases: NVD, GitHub Advisory Database, RedHat advisories, and distro-specific feeds. The two dominant open-source tools are Trivy (by Aqua Security) and Grype (by Anchore). Both support scanning images by name, by tarball, or directly from a filesystem — which matters because you want to scan in CI before the image is pushed, not after.

Trivy is the de-facto standard in modern pipelines. It is a single binary with no daemon, handles container images, filesystems, git repos, and IaC files all in one tool, and emits SARIF output that integrates with GitHub Advanced Security and GitLab Security Dashboards.

# Scan a local image (built but not yet pushed) — fail CI on HIGH or CRITICAL
trivy image --exit-code 1 --severity HIGH,CRITICAL myapp:${CI_COMMIT_SHA}

# Scan from a saved tarball (useful when the Docker daemon is not available in CI)
docker save myapp:${CI_COMMIT_SHA} | trivy image --input /dev/stdin --exit-code 1 --severity HIGH,CRITICAL

# Emit SARIF for GitHub Advanced Security upload
trivy image --format sarif --output trivy-results.sarif myapp:${CI_COMMIT_SHA}

# .trivyignore — suppress accepted risks by CVE ID and expiry date
CVE-2023-44487  exp:2024-12-31  # HTTP/2 Rapid Reset — mitigated at LB layer
CVE-2024-21626              # runc container escape — base image rebuild pending

Exit code discipline: --exit-code 1 is what makes the scanner a gate rather than a reporter. Without it, Trivy will list every CVE and exit 0 — your pipeline looks green while shipping a critical RCE. Always set an explicit exit code; set the severity threshold to match your organization's SLA (typically HIGH,CRITICAL for blocking, MEDIUM for warning-only).

Base Image Strategy: The Biggest Lever

The fastest way to eliminate CVEs is to shrink the base image. A typical ubuntu:22.04-based image ships 200–400 OS packages. A gcr.io/distroless/base-debian12 image ships roughly 20. A scratch-based Go binary ships zero OS packages. At Google, distroless images are the default for all production workloads — not because they are trendy but because the attack surface reduction is dramatic and measurable.

The standard multi-stage Dockerfile pattern achieves this without sacrificing developer ergonomics: build in a full SDK image, copy only the compiled artifact into a minimal runtime image.

# Multi-stage build: build in full SDK, run in distroless
FROM golang:1.22-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -trimpath -ldflags="-s -w" -o /app/server ./cmd/server

# Runtime stage — distroless has no shell, no package manager, no utilities
FROM gcr.io/distroless/static-debian12:nonroot
COPY --from=builder /app/server /server
USER nonroot:nonroot
EXPOSE 8080
ENTRYPOINT ["/server"]

Pin digest, not tag. FROM golang:1.22-alpine is mutable — the tag can be silently rewritten to a different (potentially vulnerable) image. Use FROM golang:1.22-alpine@sha256:<digest> in production Dockerfiles. Dependabot and Renovate both handle digest-pinned base image updates automatically.

IaC Scanning: Catching Misconfigs Before Apply

Terraform, Kubernetes manifests, Helm charts, and CloudFormation templates encode your security posture as code. A privileged: true in a DaemonSet, an S3 bucket with block_public_acls = false, or a security group rule with cidr_blocks = ["0.0.0.0/0"] on port 22 is a misconfiguration that will reach production the moment someone runs terraform apply or kubectl apply — unless you scan it first.

The two leading IaC scanners are Checkov (by Bridgecrew/Prisma Cloud) and Trivy's built-in config scanner. Both support Terraform HCL, Kubernetes YAML, Helm, Dockerfile, CloudFormation, and ARM templates. For fine-grained, policy-as-code control, OPA Conftest lets you write custom Rego policies against any structured config file.

# Scan all Terraform files in the current directory
trivy config --exit-code 1 --severity HIGH,CRITICAL ./terraform/

# Checkov with SARIF output — integrates with GitHub Advanced Security
checkov -d ./terraform --output sarif --output-file-path checkov-results.sarif

# Conftest: validate K8s manifests against custom Rego policies
conftest test ./k8s/deployment.yaml --policy ./policies/

# Sample Rego policy — deny privileged containers
# policies/no_privileged.rego
package main

deny[msg] {
  input.kind == "Deployment"
  container := input.spec.template.spec.containers[_]
  container.securityContext.privileged == true
  msg := sprintf("Container %s must not run as privileged", [container.name])
}

The Scanning Architecture in CI

At production scale, image and IaC scanning jobs run in parallel with other CI stages to avoid adding latency to the critical path. The standard layout for a GitHub Actions pipeline:

Three security scan jobs run in parallel after the build; all must exit 0 before the image is promoted to the registry.

# .github/workflows/security-scan.yml — parallel image + IaC gates
jobs:
  build:
    runs-on: ubuntu-latest
    outputs:
      image: ${{ steps.meta.outputs.tags }}
    steps:
      - uses: actions/checkout@v4
      - name: Build image (local, not pushed yet)
        run: docker build -t myapp:${{ github.sha }} .

  trivy-image:
    needs: build
    runs-on: ubuntu-latest
    steps:
      - name: Run Trivy CVE scan
        uses: aquasecurity/trivy-action@0.20.0
        with:
          image-ref: myapp:${{ github.sha }}
          exit-code: '1'
          severity: 'HIGH,CRITICAL'
          format: sarif
          output: trivy-image.sarif
      - uses: github/codeql-action/upload-sarif@v3
        if: always()
        with:
          sarif_file: trivy-image.sarif

  trivy-config:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run Trivy IaC scan
        uses: aquasecurity/trivy-action@0.20.0
        with:
          scan-type: config
          scan-ref: .
          exit-code: '1'
          severity: 'HIGH,CRITICAL'

  push:
    needs: [trivy-image, trivy-config]
    runs-on: ubuntu-latest
    steps:
      - name: Push to registry
        run: docker push myapp:${{ github.sha }}

Common Critical Findings and How to Fix Them

Understanding the most frequent high-severity findings helps you prioritize remediation. These are the patterns that consistently appear across large engineering organizations:

Running as root — the default in almost every Dockerfile. Fix: add USER 1001:1001 or use USER nonroot from distroless. Kubernetes: enforce with a PodSecurity admission policy (runAsNonRoot: true).
Writable root filesystem — allows an attacker who achieves code execution to persist changes. Fix: readOnlyRootFilesystem: true in the container's securityContext, with explicit emptyDir mounts for writable paths the app needs.
Privileged container — equivalent to root on the host node. This should never appear outside very specific CNI/storage plugins. Zero tolerance: privileged: false must be policy, not a reminder.
Missing resource limits — not a CVE but a Checkov HIGH: an unbounded container can OOM-kill its neighbors. Always set resources.limits.cpu and resources.limits.memory.
Outdated OS packages in base image — the most common CVE source. Fix: rebuild from a current base weekly via automated Dependabot/Renovate PRs, or use a distroless image to eliminate the OS layer entirely.

Production pitfall — ignoring CVEs indefinitely: Every .trivyignore entry must have an expiry date (exp:YYYY-MM-DD) and a linked ticket. Without expiry, suppressed CVEs accumulate silently. A common incident pattern: a team suppresses a CVE pending a base image upgrade, the upgrade is deprioritized, the CVE is exploited six months later — and the post-mortem discovers the scanner had been reporting it green the whole time.

Scanning in Admission Control: The Last Wall

CI scanning gates are necessary but not sufficient. Engineers can push images via CLI, CI pipelines can be bypassed, and third-party images may enter your cluster from Helm charts. The defense-in-depth approach adds admission-time scanning via Kyverno or OPA Gatekeeper policies that check image provenance and scan status before pods are scheduled.

A Kyverno policy that blocks images not scanned by Trivy in the last 24 hours (using an attestation annotation set by CI) looks like this:

# kyverno-policy: require scan attestation annotation
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-image-scan
spec:
  validationFailureAction: Enforce
  rules:
    - name: check-scan-annotation
      match:
        resources:
          kinds: [Pod]
      validate:
        message: "Image must have a valid Trivy scan annotation from the last 24h"
        pattern:
          spec:
            containers:
              - image: "*"
                # CI sets this annotation via: kubectl annotate ... trivy-scanned=<timestamp>
            initContainers: "<*>"

At scale, teams integrate with a container registry that supports continuous background scanning — Docker Hub, ECR, GCR, and Artifact Registry all offer this. The registry scans every image on push and re-scans daily against fresh CVE feeds, so even images that passed CI yesterday are flagged if a new critical advisory drops overnight. The runtime posture stays current without requiring a new build.

The full scanning posture — CI gate blocking on build, IaC gate blocking on plan, admission control blocking on schedule, and registry scanning flagging on new CVEs — closes the window from image creation to deployment to runtime without any single point of failure. Each layer catches what the previous layer missed.