Helm & Kubernetes Packaging

Project: Chart a Production App

18 min Lesson 10 of 28

Project: Chart a Production App

Everything in this tutorial — templating, named helpers, dependencies, hooks, and versioning — converges here. You will build a production-grade Helm chart for a real-world web service: an API backend backed by Redis, with per-environment value files, a database migration hook, a readiness probe, a PodDisruptionBudget, and RBAC. By the end you will have a chart you can drop into a GitHub Actions pipeline and ship to any Kubernetes cluster without modification.

The application is taskflow-api: a stateless Node.js REST service that reads from Redis (managed externally — AWS ElastiCache in staging/prod, a Helm subchart in dev). It needs a Deployment, a Service, an Ingress, a ConfigMap, a Secret, a ServiceAccount, a PodDisruptionBudget, and an HPA. Every field that differs across environments is exposed as a chart value.

Step 1 — Scaffold and Chart.yaml

Start from the official scaffold and immediately edit Chart.yaml to declare the real metadata, Kubernetes version constraint, and the Redis subchart dependency:

helm create taskflow-api
cd taskflow-api

# Remove the boilerplate files we will replace entirely
rm -rf templates/*  values.yaml  charts/
mkdir charts

Edit Chart.yaml:

apiVersion: v2
name: taskflow-api
description: TaskFlow REST API — stateless Node.js service
type: application
version: 0.1.0          # chart version — bump on every chart change
appVersion: "1.0.0"     # application image tag — updated by CI

kubeVersion: ">=1.28.0"

maintainers:
  - name: platform-team
    email: platform@example.com

dependencies:
  - name: redis
    version: "19.x.x"
    repository: "oci://registry-1.docker.io/bitnamicharts"
    condition: redis.enabled   # disabled in staging/prod (use ElastiCache)

Chart version vs. appVersion: version tracks the chart itself — templating changes, new values, added objects. appVersion tracks the Docker image version of the application. In CI you update appVersion on every image push; version only when the chart structure changes. Keep them decoupled. Google and Spotify platform teams enforce this by having the build pipeline sed-replace only appVersion and bump version via a separate PR to the chart repo.

Step 2 — The Master values.yaml

Every environment-varying parameter lives here with safe, minimal defaults (single replica, small resources — fine for dev, overridden for staging/prod). Document every key with a comment; this file is the public interface of your chart.

# values.yaml — defaults safe for local/dev environments

replicaCount: 1

image:
  repository: ghcr.io/example/taskflow-api
  tag: ""           # overridden by CI via --set image.tag=$SHA
  pullPolicy: IfNotPresent

imagePullSecrets: []
nameOverride: ""
fullnameOverride: ""

serviceAccount:
  create: true
  annotations: {}   # prod: {"eks.amazonaws.com/role-arn": "arn:aws:iam::..."}
  name: ""

podAnnotations:
  prometheus.io/scrape: "true"
  prometheus.io/port:   "3000"
  prometheus.io/path:   "/metrics"

podSecurityContext:
  runAsNonRoot: true
  runAsUser: 1001
  fsGroup: 1001

securityContext:
  allowPrivilegeEscalation: false
  readOnlyRootFilesystem: true
  capabilities:
    drop: [ALL]

service:
  type: ClusterIP
  port: 80
  targetPort: 3000

ingress:
  enabled: false
  className: nginx
  annotations: {}
  hosts:
    - host: taskflow.local
      paths:
        - path: /
          pathType: Prefix
  tls: []

resources:
  requests:
    cpu: 100m
    memory: 128Mi
  limits:
    cpu: 500m
    memory: 256Mi

autoscaling:
  enabled: false
  minReplicas: 1
  maxReplicas: 5
  targetCPUUtilizationPercentage: 70

pdb:
  enabled: false        # must be false with replicaCount: 1
  minAvailable: 1

config:
  logLevel: info
  nodeEnv: development

# External Redis DSN (used in staging/prod)
externalRedis:
  host: ""
  port: 6379

# Bitnami Redis subchart — enabled in dev only
redis:
  enabled: true
  architecture: standalone
  auth:
    enabled: false
  master:
    persistence:
      enabled: false
    resources:
      requests:
        cpu: 50m
        memory: 64Mi

Step 3 — Templates

Create templates/_helpers.tpl first — the naming helpers every other template will call:

{{/*
Expand the name of the chart.
*/}}
{{- define "taskflow-api.name" -}}
{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" }}
{{- end }}

{{- define "taskflow-api.fullname" -}}
{{- if .Values.fullnameOverride }}
{{- .Values.fullnameOverride | trunc 63 | trimSuffix "-" }}
{{- else }}
{{- $name := default .Chart.Name .Values.nameOverride }}
{{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" }}
{{- end }}
{{- end }}

{{- define "taskflow-api.labels" -}}
helm.sh/chart: {{ include "taskflow-api.name" . }}-{{ .Chart.Version }}
app.kubernetes.io/name: {{ include "taskflow-api.name" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
app.kubernetes.io/version: {{ .Chart.AppVersion | quote }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
{{- end }}

{{- define "taskflow-api.selectorLabels" -}}
app.kubernetes.io/name: {{ include "taskflow-api.name" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
{{- end }}

{{/*
Redis URL — internal subchart or external host
*/}}
{{- define "taskflow-api.redisUrl" -}}
{{- if .Values.redis.enabled -}}
redis://{{ .Release.Name }}-redis-master:6379
{{- else -}}
redis://{{ required "externalRedis.host required when redis.enabled=false" .Values.externalRedis.host }}:{{ .Values.externalRedis.port }}
{{- end }}
{{- end }}

Now create the core manifests. The Deployment is the most complex template — note the security context, the readiness/liveness probes, and how the Redis URL is injected via an environment variable sourced from the ConfigMap:

# templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ include "taskflow-api.fullname" . }}
  labels:
    {{- include "taskflow-api.labels" . | nindent 4 }}
spec:
  {{- if not .Values.autoscaling.enabled }}
  replicas: {{ .Values.replicaCount }}
  {{- end }}
  selector:
    matchLabels:
      {{- include "taskflow-api.selectorLabels" . | nindent 6 }}
  template:
    metadata:
      annotations:
        checksum/config: {{ include (print $.Template.BasePath "/configmap.yaml") . | sha256sum }}
        {{- with .Values.podAnnotations }}
        {{- toYaml . | nindent 8 }}
        {{- end }}
      labels:
        {{- include "taskflow-api.selectorLabels" . | nindent 8 }}
    spec:
      {{- with .Values.imagePullSecrets }}
      imagePullSecrets:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      serviceAccountName: {{ include "taskflow-api.fullname" . }}
      securityContext:
        {{- toYaml .Values.podSecurityContext | nindent 8 }}
      containers:
        - name: api
          securityContext:
            {{- toYaml .Values.securityContext | nindent 12 }}
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
          imagePullPolicy: {{ .Values.image.pullPolicy }}
          ports:
            - name: http
              containerPort: {{ .Values.service.targetPort }}
              protocol: TCP
          envFrom:
            - configMapRef:
                name: {{ include "taskflow-api.fullname" . }}
          readinessProbe:
            httpGet:
              path: /healthz
              port: http
            initialDelaySeconds: 5
            periodSeconds: 10
            failureThreshold: 3
          livenessProbe:
            httpGet:
              path: /healthz
              port: http
            initialDelaySeconds: 15
            periodSeconds: 20
          resources:
            {{- toYaml .Values.resources | nindent 12 }}

ConfigMap checksum annotation: The checksum/config annotation forces a rolling restart whenever the ConfigMap changes. Without it, a helm upgrade that only updates a config value will not restart the pods — they keep running with stale config. This single line, used by every major Helm chart in the ecosystem (cert-manager, Prometheus, ingress-nginx), prevents a class of "why didn't my config change take effect?" incidents.

Create the remaining templates — ConfigMap, Service, Ingress, ServiceAccount, PDB, and HPA:

# templates/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: {{ include "taskflow-api.fullname" . }}
  labels:
    {{- include "taskflow-api.labels" . | nindent 4 }}
data:
  NODE_ENV:    {{ .Values.config.nodeEnv | quote }}
  LOG_LEVEL:   {{ .Values.config.logLevel | quote }}
  REDIS_URL:   {{ include "taskflow-api.redisUrl" . | quote }}
  PORT:        {{ .Values.service.targetPort | quote }}

---
# templates/pdb.yaml
{{- if .Values.pdb.enabled }}
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: {{ include "taskflow-api.fullname" . }}
  labels:
    {{- include "taskflow-api.labels" . | nindent 4 }}
spec:
  minAvailable: {{ .Values.pdb.minAvailable }}
  selector:
    matchLabels:
      {{- include "taskflow-api.selectorLabels" . | nindent 6 }}
{{- end }}

---
# templates/hpa.yaml
{{- if .Values.autoscaling.enabled }}
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: {{ include "taskflow-api.fullname" . }}
  labels:
    {{- include "taskflow-api.labels" . | nindent 4 }}
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: {{ include "taskflow-api.fullname" . }}
  minReplicas: {{ .Values.autoscaling.minReplicas }}
  maxReplicas: {{ .Values.autoscaling.maxReplicas }}
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: {{ .Values.autoscaling.targetCPUUtilizationPercentage }}
{{- end }}

Step 4 — Per-Environment Values Files

Never use --set for more than one or two scalar values in production. Instead, maintain a values file per environment in a separate deploy/ directory (or a dedicated GitOps repo). The base values.yaml holds safe dev defaults; environment files only override what differs:

# deploy/values-staging.yaml
replicaCount: 2

image:
  pullPolicy: Always

ingress:
  enabled: true
  className: nginx
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-staging
  hosts:
    - host: api-staging.example.com
      paths:
        - path: /
          pathType: Prefix
  tls:
    - secretName: taskflow-api-staging-tls
      hosts: [api-staging.example.com]

resources:
  requests:
    cpu: 200m
    memory: 256Mi
  limits:
    cpu: 1000m
    memory: 512Mi

config:
  logLevel: debug
  nodeEnv: staging

# Use ElastiCache — disable the subchart
redis:
  enabled: false
externalRedis:
  host: staging-redis.abc123.0001.use1.cache.amazonaws.com
  port: 6379

# deploy/values-prod.yaml
replicaCount: 3

image:
  pullPolicy: IfNotPresent

ingress:
  enabled: true
  className: nginx
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
    nginx.ingress.kubernetes.io/rate-limit: "100"
  hosts:
    - host: api.example.com
      paths:
        - path: /
          pathType: Prefix
  tls:
    - secretName: taskflow-api-prod-tls
      hosts: [api.example.com]

resources:
  requests:
    cpu: 500m
    memory: 512Mi
  limits:
    cpu: 2000m
    memory: 1Gi

autoscaling:
  enabled: true
  minReplicas: 3
  maxReplicas: 20
  targetCPUUtilizationPercentage: 65

pdb:
  enabled: true
  minAvailable: 2

config:
  logLevel: warn
  nodeEnv: production

serviceAccount:
  annotations:
    eks.amazonaws.com/role-arn: "arn:aws:iam::123456789012:role/taskflow-api-prod"

redis:
  enabled: false
externalRedis:
  host: prod-redis.abc123.0001.use1.cache.amazonaws.com
  port: 6379

The base values.yaml provides safe dev defaults; environment files override only what differs — the Helm engine deep-merges both to produce the final rendered manifests.

Step 5 — Pre-Upgrade Migration Hook

Database migrations must run before the new pods come up. A Helm pre-upgrade Job hook is the canonical pattern:

# templates/hooks/migrate.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: {{ include "taskflow-api.fullname" . }}-migrate-{{ .Release.Revision }}
  labels:
    {{- include "taskflow-api.labels" . | nindent 4 }}
  annotations:
    "helm.sh/hook": pre-upgrade,pre-install
    "helm.sh/hook-weight": "-5"
    "helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
spec:
  backoffLimit: 2
  activeDeadlineSeconds: 300
  template:
    spec:
      restartPolicy: Never
      serviceAccountName: {{ include "taskflow-api.fullname" . }}
      securityContext:
        runAsNonRoot: true
        runAsUser: 1001
      containers:
        - name: migrate
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
          command: ["node", "dist/migrate.js"]
          envFrom:
            - configMapRef:
                name: {{ include "taskflow-api.fullname" . }}

Production pitfall — hook-delete-policy: Always include hook-delete-policy: before-hook-creation,hook-succeeded. Without hook-succeeded, old migration Jobs accumulate in the namespace after each deploy. Without before-hook-creation, a failed Job blocks the next upgrade because Kubernetes refuses to create a Job with the same name. The Release.Revision suffix makes each Job name unique per revision, giving you a fresh object every time while the delete policy cleans up successes automatically.

Step 6 — Install and Verify All Environments

Use helm template locally first — zero cluster access needed — to verify that each environment file renders exactly what you expect. This is a mandatory step before any CI pipeline ships a chart:

# Resolve the Redis subchart dependency
helm dependency update .

# Dry-run rendering for each environment (no cluster needed)
helm template taskflow-dev . \
  --debug 2>&1 | head -100

helm template taskflow-staging . \
  --values deploy/values-staging.yaml \
  --debug 2>&1 | grep "replicas:\|redis.enabled\|externalRedis"

helm template taskflow-prod . \
  --values deploy/values-prod.yaml \
  --debug 2>&1 | grep -E "minAvailable|maxReplicas|cpu:|memory:"

# Lint the chart (mandatory in CI)
helm lint . --values deploy/values-staging.yaml
helm lint . --values deploy/values-prod.yaml

# Install dev (uses embedded Redis subchart)
helm upgrade --install taskflow-dev . \
  --namespace taskflow-dev \
  --create-namespace \
  --wait --atomic --timeout 5m0s

# Deploy staging (external Redis, TLS, debug logging)
helm upgrade --install taskflow-staging . \
  --namespace taskflow-staging \
  --create-namespace \
  --values deploy/values-staging.yaml \
  --set image.tag=${IMAGE_TAG} \
  --wait --atomic --timeout 5m0s

# Deploy prod (HPA, PDB, production Redis, IRSA annotation)
helm upgrade --install taskflow-prod . \
  --namespace taskflow-prod \
  --create-namespace \
  --values deploy/values-prod.yaml \
  --set image.tag=${IMAGE_TAG} \
  --wait --atomic --timeout 10m0s

Diff before you deploy: Install the helm-diff plugin (helm plugin install https://github.com/databus23/helm-diff) and run helm diff upgrade taskflow-prod . --values deploy/values-prod.yaml before every production upgrade. It prints a colour-coded diff of what will change in the cluster — an invaluable safety check that is standard practice on every production deployment at companies like Datadog, Stripe, and GitHub.

What You Have Built

The final chart structure:

taskflow-api/
├── Chart.yaml                     # metadata + Redis dependency
├── values.yaml                    # master defaults (dev-safe)
├── deploy/
│   ├── values-staging.yaml        # staging overrides
│   └── values-prod.yaml           # prod overrides (HPA, PDB, IRSA)
├── templates/
│   ├── _helpers.tpl               # naming + redisUrl helpers
│   ├── deployment.yaml            # readiness probes, checksum annotation
│   ├── service.yaml               # ClusterIP
│   ├── ingress.yaml               # conditionally enabled
│   ├── configmap.yaml             # NODE_ENV, LOG_LEVEL, REDIS_URL
│   ├── serviceaccount.yaml        # IRSA-annotatable
│   ├── pdb.yaml                   # conditional, minAvailable
│   ├── hpa.yaml                   # conditional autoscaling/v2
│   └── hooks/
│       └── migrate.yaml           # pre-install/pre-upgrade Job
└── charts/
    └── redis-19.x.x.tgz           # vendored subchart (dev only)

This chart encodes six months of hard-won production knowledge into a single deployable unit: it prevents configuration drift across environments, enforces security contexts by default, handles zero-downtime deploys via PDB and rolling update strategy, runs migrations atomically before traffic shifts, and gives you a full audit trail via Helm release history. Every pattern here — the checksum annotation, the conditional PDB, the IRSA annotation path, the per-environment values files — mirrors exactly how mature platform teams package applications at big-tech scale.