Docker & Containerization

CMD vs ENTRYPOINT & Configuration

18 min Lesson 5 of 30

CMD vs ENTRYPOINT & Configuration

Every container needs to know what process to run. Docker gives you two instructions to define that — CMD and ENTRYPOINT — and understanding the difference between them is one of the most practically important things you will learn about Dockerfiles. Get it wrong and containers silently ignore arguments, ignore signals, or behave completely differently between development and CI. Get it right and your images work intuitively on the command line, in Kubernetes, and in production.

Startup Semantics: How Docker Picks PID 1

When Docker starts a container it creates a namespaced Linux process. The first process that runs — PID 1 — is the container's init. When PID 1 exits, the container stops. This matters for two reasons:

Signal propagation: The kernel sends SIGTERM to PID 1 when you run docker stop. If PID 1 is a shell wrapper that does not forward signals, your real process never gets a graceful-shutdown signal and Docker will kill it with SIGKILL after a 10-second timeout — leaving open connections, uncommitted transactions, or half-written files behind.
Zombie reaping: PID 1 is responsible for reaping zombie processes (children that have exited but whose exit status has not been collected). Most application runtimes are not written to do this. Using exec form (see below) and a minimal init like tini addresses both problems.

Shell Form vs Exec Form

Both CMD and ENTRYPOINT accept two syntaxes. This distinction drives everything else.

# Shell form — Docker runs: /bin/sh -c "your command"
CMD python app.py
ENTRYPOINT python app.py

# Exec form — Docker runs the binary directly as PID 1 (no shell wrapper)
CMD ["python", "app.py"]
ENTRYPOINT ["python", "app.py"]

Shell form wraps your command in /bin/sh -c. That shell becomes PID 1, not your process. It does not forward SIGTERM to children by default, and the shell exits when the command exits — but the shell itself is PID 1, so the timing is unpredictable. Always prefer exec form in production images.

Production pitfall — shell form and signal loss: A Node.js API using shell form (CMD node server.js) will receive SIGTERM at the shell process, not at Node. The shell ignores it (or terminates immediately), and Node gets SIGKILL after Docker's stop timeout. Under Kubernetes, this means your pods will always hit the termination grace period (30 s default) and pods will be force-killed, leaving in-flight HTTP requests unfinished. Switch to exec form: CMD ["node", "server.js"].

ENTRYPOINT vs CMD: The Interaction Model

The critical rule is simple: ENTRYPOINT defines the executable; CMD provides default arguments to it. When both are present, Docker concatenates them: ENTRYPOINT + CMD. Arguments passed on the command line replace CMD but never replace ENTRYPOINT (unless you use --entrypoint).

How CMD and ENTRYPOINT combine. CLI arguments replace CMD but not ENTRYPOINT, making ENTRYPOINT the fixed executable and CMD the overridable defaults.

This interaction is what makes CLI-style images possible — images for tools like curl, aws-cli, or custom migration runners where the entrypoint is the tool itself and users pass subcommands as arguments:

# A migration runner image — users pass subcommands as arguments
FROM python:3.13-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .

# Fixed executable; CMD gives a safe default for local dev
ENTRYPOINT ["python", "-m", "alembic"]
CMD ["--help"]

# Usage:
#   docker run myapp:latest          -> python -m alembic --help
#   docker run myapp:latest upgrade head  -> python -m alembic upgrade head
#   docker run myapp:latest history       -> python -m alembic history

Key rule: Use ENTRYPOINT when the image has a single, clear purpose (a CLI tool, a server, a worker). Use CMD alone for base images where the caller is expected to provide a completely different command. Use both when you want a fixed executable with configurable default arguments.

Environment Variables: Runtime Configuration

The Twelve-Factor App methodology (Factor III) says configuration that varies between deployments must come from environment variables, never baked into the image. Docker gives you two mechanisms:

ENV KEY=value — sets a variable at build time that persists into the running container as a default. Visible in docker inspect.
docker run -e KEY=value or Kubernetes env: — overrides at runtime. This is the production pattern.

# Production-grade web service Dockerfile
FROM python:3.13-slim AS base

WORKDIR /app

# Sensible defaults — all are overridable at runtime
ENV APP_ENV=production \
    PORT=8000 \
    LOG_LEVEL=info \
    WORKERS=4

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

# Exec form — no shell wrapper, signals reach gunicorn directly
ENTRYPOINT ["gunicorn"]
CMD ["--bind", "0.0.0.0:8000", "--workers", "4", "app:create_app()"]

# --- runtime ---
# docker run -e PORT=9000 -e APP_ENV=staging myapp:latest
# In Kubernetes:
# env:
#   - name: DATABASE_URL
#     valueFrom:
#       secretKeyRef:
#         name: db-creds
#         key: url

Pro practice — never put secrets in ENV instructions: ENV SECRET_KEY=abc123 bakes the secret into every image layer permanently. It appears in docker history, docker inspect, and any image registry you push to. Secrets must arrive at runtime via mounted files (Kubernetes Secrets, Docker secrets), environment variables injected by the orchestrator, or a secrets manager SDK. The image must never carry credential data.

Build Arguments: Compile-Time Parameterization

ARG is CMD's build-time equivalent. It defines a variable available only during the docker build phase — it does not persist into the running container. Common uses: pinning dependency versions, toggling debug flags, passing a Git SHA for build provenance.

# Using ARG for build-time parameterization
FROM node:22-alpine AS builder

# ARG is only available during build; it does NOT leak into the runtime image
ARG NODE_ENV=production
ARG APP_VERSION=unknown
ARG GIT_SHA=unknown

WORKDIR /app
COPY package*.json ./
RUN npm ci --omit=dev

COPY . .

# Bake non-secret metadata into the image label (not ENV) for traceability
LABEL org.opencontainers.image.version="${APP_VERSION}" \
      org.opencontainers.image.revision="${GIT_SHA}"

RUN npm run build

# --- multi-stage: only the built artifact goes into the final image ---
FROM node:22-alpine AS runtime
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules

ENV NODE_ENV=production \
    PORT=3000

ENTRYPOINT ["node"]
CMD ["dist/server.js"]

# Build invocation from CI:
# docker build \
#   --build-arg APP_VERSION=$(git describe --tags) \
#   --build-arg GIT_SHA=$(git rev-parse --short HEAD) \
#   --build-arg NODE_ENV=production \
#   -t myapp:$(git rev-parse --short HEAD) .

Production pitfall — ARG before FROM invalidates the cache: Every ARG that changes (like a Git SHA) invalidates the build cache at that layer and all subsequent layers. Place frequently-changing ARG values as late in the Dockerfile as possible — after dependency installation — so your RUN npm ci / RUN pip install layers are still cached. Putting ARG GIT_SHA on line 2 means a full reinstall on every commit.

Shell Scripts as Entrypoints: The init Pattern

Complex services often need to do work before the main process starts: wait for a database to be ready, generate a config file from environment variables, run database migrations. The standard pattern is a shell script entrypoint that uses exec to hand off to the main process — preserving PID 1 ownership and signal forwarding.

#!/bin/sh
# docker-entrypoint.sh — used as ENTRYPOINT in production API images

set -e  # exit on any error

# 1. Wait for the database (example: Postgres)
echo "Waiting for database at ${DB_HOST}:${DB_PORT}..."
until nc -z "${DB_HOST}" "${DB_PORT}"; do
  sleep 1
done
echo "Database ready."

# 2. Run migrations (only on the first replica — use a distributed lock in real infra)
python manage.py migrate --noinput

# 3. Replace this shell with the main process (exec is critical — makes it PID 1)
exec "$@"

# Dockerfile referencing the script
FROM python:3.13-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .

COPY docker-entrypoint.sh /usr/local/bin/docker-entrypoint.sh
RUN chmod +x /usr/local/bin/docker-entrypoint.sh

# Script as ENTRYPOINT; CMD is passed as "$@" to exec
ENTRYPOINT ["docker-entrypoint.sh"]
CMD ["gunicorn", "--bind", "0.0.0.0:8000", "app:create_app()"]

The exec "$@" at the end of the shell script is the entire trick. Without it, the shell remains PID 1 and your gunicorn process is a child. With it, the shell replaces itself with gunicorn, which becomes PID 1 and receives signals directly. This pattern is used verbatim in the official PostgreSQL, Redis, and Nginx Docker images.

Pro practice — use tini for multi-process containers: If your container needs to run multiple processes (e.g., a sidecar exporter alongside the main app), use tini as a minimal init system. Add RUN apt-get install -y tini and set ENTRYPOINT ["/usr/bin/tini", "--", "your-app"]. Tini reaps zombies and forwards signals correctly, which bare shells and most application runtimes do not. Kubernetes users can also set shareProcessNamespace: true and let the kubelet handle this.

Summary: Decision Guide

Use exec form for both CMD and ENTRYPOINT — always.
Use ENTRYPOINT to define what the container is; use CMD for default arguments the user might override.
Avoid ENV for secrets — inject those at runtime from an orchestrator or secrets manager.
Put ARG instructions as late in the Dockerfile as possible to preserve the layer cache.
Shell script entrypoints must exec "$@" as their final line so the main process inherits PID 1.