Helm & Kubernetes Packaging

Project: Chart a Production App

18 min Lesson 10 of 28

Project: Chart a Production App

Everything in this tutorial — templating, named helpers, dependencies, hooks, and versioning — converges here. You will build a production-grade Helm chart for a real-world web service: an API backend backed by Redis, with per-environment value files, a database migration hook, a readiness probe, a PodDisruptionBudget, and RBAC. By the end you will have a chart you can drop into a GitHub Actions pipeline and ship to any Kubernetes cluster without modification.

The application is taskflow-api: a stateless Node.js REST service that reads from Redis (managed externally — AWS ElastiCache in staging/prod, a Helm subchart in dev). It needs a Deployment, a Service, an Ingress, a ConfigMap, a Secret, a ServiceAccount, a PodDisruptionBudget, and an HPA. Every field that differs across environments is exposed as a chart value.

Step 1 — Scaffold and Chart.yaml

Start from the official scaffold and immediately edit Chart.yaml to declare the real metadata, Kubernetes version constraint, and the Redis subchart dependency:

helm create taskflow-api cd taskflow-api # Remove the boilerplate files we will replace entirely rm -rf templates/* values.yaml charts/ mkdir charts

Edit Chart.yaml:

apiVersion: v2 name: taskflow-api description: TaskFlow REST API — stateless Node.js service type: application version: 0.1.0 # chart version — bump on every chart change appVersion: "1.0.0" # application image tag — updated by CI kubeVersion: ">=1.28.0" maintainers: - name: platform-team email: platform@example.com dependencies: - name: redis version: "19.x.x" repository: "oci://registry-1.docker.io/bitnamicharts" condition: redis.enabled # disabled in staging/prod (use ElastiCache)
Chart version vs. appVersion: version tracks the chart itself — templating changes, new values, added objects. appVersion tracks the Docker image version of the application. In CI you update appVersion on every image push; version only when the chart structure changes. Keep them decoupled. Google and Spotify platform teams enforce this by having the build pipeline sed-replace only appVersion and bump version via a separate PR to the chart repo.

Step 2 — The Master values.yaml

Every environment-varying parameter lives here with safe, minimal defaults (single replica, small resources — fine for dev, overridden for staging/prod). Document every key with a comment; this file is the public interface of your chart.

# values.yaml — defaults safe for local/dev environments replicaCount: 1 image: repository: ghcr.io/example/taskflow-api tag: "" # overridden by CI via --set image.tag=$SHA pullPolicy: IfNotPresent imagePullSecrets: [] nameOverride: "" fullnameOverride: "" serviceAccount: create: true annotations: {} # prod: {"eks.amazonaws.com/role-arn": "arn:aws:iam::..."} name: "" podAnnotations: prometheus.io/scrape: "true" prometheus.io/port: "3000" prometheus.io/path: "/metrics" podSecurityContext: runAsNonRoot: true runAsUser: 1001 fsGroup: 1001 securityContext: allowPrivilegeEscalation: false readOnlyRootFilesystem: true capabilities: drop: [ALL] service: type: ClusterIP port: 80 targetPort: 3000 ingress: enabled: false className: nginx annotations: {} hosts: - host: taskflow.local paths: - path: / pathType: Prefix tls: [] resources: requests: cpu: 100m memory: 128Mi limits: cpu: 500m memory: 256Mi autoscaling: enabled: false minReplicas: 1 maxReplicas: 5 targetCPUUtilizationPercentage: 70 pdb: enabled: false # must be false with replicaCount: 1 minAvailable: 1 config: logLevel: info nodeEnv: development # External Redis DSN (used in staging/prod) externalRedis: host: "" port: 6379 # Bitnami Redis subchart — enabled in dev only redis: enabled: true architecture: standalone auth: enabled: false master: persistence: enabled: false resources: requests: cpu: 50m memory: 64Mi

Step 3 — Templates

Create templates/_helpers.tpl first — the naming helpers every other template will call:

{{/* Expand the name of the chart. */}} {{- define "taskflow-api.name" -}} {{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" }} {{- end }} {{- define "taskflow-api.fullname" -}} {{- if .Values.fullnameOverride }} {{- .Values.fullnameOverride | trunc 63 | trimSuffix "-" }} {{- else }} {{- $name := default .Chart.Name .Values.nameOverride }} {{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" }} {{- end }} {{- end }} {{- define "taskflow-api.labels" -}} helm.sh/chart: {{ include "taskflow-api.name" . }}-{{ .Chart.Version }} app.kubernetes.io/name: {{ include "taskflow-api.name" . }} app.kubernetes.io/instance: {{ .Release.Name }} app.kubernetes.io/version: {{ .Chart.AppVersion | quote }} app.kubernetes.io/managed-by: {{ .Release.Service }} {{- end }} {{- define "taskflow-api.selectorLabels" -}} app.kubernetes.io/name: {{ include "taskflow-api.name" . }} app.kubernetes.io/instance: {{ .Release.Name }} {{- end }} {{/* Redis URL — internal subchart or external host */}} {{- define "taskflow-api.redisUrl" -}} {{- if .Values.redis.enabled -}} redis://{{ .Release.Name }}-redis-master:6379 {{- else -}} redis://{{ required "externalRedis.host required when redis.enabled=false" .Values.externalRedis.host }}:{{ .Values.externalRedis.port }} {{- end }} {{- end }}

Now create the core manifests. The Deployment is the most complex template — note the security context, the readiness/liveness probes, and how the Redis URL is injected via an environment variable sourced from the ConfigMap:

# templates/deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: {{ include "taskflow-api.fullname" . }} labels: {{- include "taskflow-api.labels" . | nindent 4 }} spec: {{- if not .Values.autoscaling.enabled }} replicas: {{ .Values.replicaCount }} {{- end }} selector: matchLabels: {{- include "taskflow-api.selectorLabels" . | nindent 6 }} template: metadata: annotations: checksum/config: {{ include (print $.Template.BasePath "/configmap.yaml") . | sha256sum }} {{- with .Values.podAnnotations }} {{- toYaml . | nindent 8 }} {{- end }} labels: {{- include "taskflow-api.selectorLabels" . | nindent 8 }} spec: {{- with .Values.imagePullSecrets }} imagePullSecrets: {{- toYaml . | nindent 8 }} {{- end }} serviceAccountName: {{ include "taskflow-api.fullname" . }} securityContext: {{- toYaml .Values.podSecurityContext | nindent 8 }} containers: - name: api securityContext: {{- toYaml .Values.securityContext | nindent 12 }} image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}" imagePullPolicy: {{ .Values.image.pullPolicy }} ports: - name: http containerPort: {{ .Values.service.targetPort }} protocol: TCP envFrom: - configMapRef: name: {{ include "taskflow-api.fullname" . }} readinessProbe: httpGet: path: /healthz port: http initialDelaySeconds: 5 periodSeconds: 10 failureThreshold: 3 livenessProbe: httpGet: path: /healthz port: http initialDelaySeconds: 15 periodSeconds: 20 resources: {{- toYaml .Values.resources | nindent 12 }}
ConfigMap checksum annotation: The checksum/config annotation forces a rolling restart whenever the ConfigMap changes. Without it, a helm upgrade that only updates a config value will not restart the pods — they keep running with stale config. This single line, used by every major Helm chart in the ecosystem (cert-manager, Prometheus, ingress-nginx), prevents a class of "why didn't my config change take effect?" incidents.

Create the remaining templates — ConfigMap, Service, Ingress, ServiceAccount, PDB, and HPA:

# templates/configmap.yaml apiVersion: v1 kind: ConfigMap metadata: name: {{ include "taskflow-api.fullname" . }} labels: {{- include "taskflow-api.labels" . | nindent 4 }} data: NODE_ENV: {{ .Values.config.nodeEnv | quote }} LOG_LEVEL: {{ .Values.config.logLevel | quote }} REDIS_URL: {{ include "taskflow-api.redisUrl" . | quote }} PORT: {{ .Values.service.targetPort | quote }} --- # templates/pdb.yaml {{- if .Values.pdb.enabled }} apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: {{ include "taskflow-api.fullname" . }} labels: {{- include "taskflow-api.labels" . | nindent 4 }} spec: minAvailable: {{ .Values.pdb.minAvailable }} selector: matchLabels: {{- include "taskflow-api.selectorLabels" . | nindent 6 }} {{- end }} --- # templates/hpa.yaml {{- if .Values.autoscaling.enabled }} apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: {{ include "taskflow-api.fullname" . }} labels: {{- include "taskflow-api.labels" . | nindent 4 }} spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: {{ include "taskflow-api.fullname" . }} minReplicas: {{ .Values.autoscaling.minReplicas }} maxReplicas: {{ .Values.autoscaling.maxReplicas }} metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: {{ .Values.autoscaling.targetCPUUtilizationPercentage }} {{- end }}

Step 4 — Per-Environment Values Files

Never use --set for more than one or two scalar values in production. Instead, maintain a values file per environment in a separate deploy/ directory (or a dedicated GitOps repo). The base values.yaml holds safe dev defaults; environment files only override what differs:

# deploy/values-staging.yaml replicaCount: 2 image: pullPolicy: Always ingress: enabled: true className: nginx annotations: cert-manager.io/cluster-issuer: letsencrypt-staging hosts: - host: api-staging.example.com paths: - path: / pathType: Prefix tls: - secretName: taskflow-api-staging-tls hosts: [api-staging.example.com] resources: requests: cpu: 200m memory: 256Mi limits: cpu: 1000m memory: 512Mi config: logLevel: debug nodeEnv: staging # Use ElastiCache — disable the subchart redis: enabled: false externalRedis: host: staging-redis.abc123.0001.use1.cache.amazonaws.com port: 6379
# deploy/values-prod.yaml replicaCount: 3 image: pullPolicy: IfNotPresent ingress: enabled: true className: nginx annotations: cert-manager.io/cluster-issuer: letsencrypt-prod nginx.ingress.kubernetes.io/rate-limit: "100" hosts: - host: api.example.com paths: - path: / pathType: Prefix tls: - secretName: taskflow-api-prod-tls hosts: [api.example.com] resources: requests: cpu: 500m memory: 512Mi limits: cpu: 2000m memory: 1Gi autoscaling: enabled: true minReplicas: 3 maxReplicas: 20 targetCPUUtilizationPercentage: 65 pdb: enabled: true minAvailable: 2 config: logLevel: warn nodeEnv: production serviceAccount: annotations: eks.amazonaws.com/role-arn: "arn:aws:iam::123456789012:role/taskflow-api-prod" redis: enabled: false externalRedis: host: prod-redis.abc123.0001.use1.cache.amazonaws.com port: 6379
Per-environment values override layering values.yaml replicas: 1 redis.enabled: true resources: tiny ingress: off values-staging.yaml replicas: 2 redis.enabled: false externalRedis.host: staging-… values-prod.yaml replicas: 3 HPA: enabled PDB: minAvailable 2 Helm Merge base + override Rendered Manifests Deployment (3 replicas) HPA (3 → 20) PDB (minAvailable 2) Ingress (api.example.com) → prod release
The base values.yaml provides safe dev defaults; environment files override only what differs — the Helm engine deep-merges both to produce the final rendered manifests.

Step 5 — Pre-Upgrade Migration Hook

Database migrations must run before the new pods come up. A Helm pre-upgrade Job hook is the canonical pattern:

# templates/hooks/migrate.yaml apiVersion: batch/v1 kind: Job metadata: name: {{ include "taskflow-api.fullname" . }}-migrate-{{ .Release.Revision }} labels: {{- include "taskflow-api.labels" . | nindent 4 }} annotations: "helm.sh/hook": pre-upgrade,pre-install "helm.sh/hook-weight": "-5" "helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded spec: backoffLimit: 2 activeDeadlineSeconds: 300 template: spec: restartPolicy: Never serviceAccountName: {{ include "taskflow-api.fullname" . }} securityContext: runAsNonRoot: true runAsUser: 1001 containers: - name: migrate image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}" command: ["node", "dist/migrate.js"] envFrom: - configMapRef: name: {{ include "taskflow-api.fullname" . }}
Production pitfall — hook-delete-policy: Always include hook-delete-policy: before-hook-creation,hook-succeeded. Without hook-succeeded, old migration Jobs accumulate in the namespace after each deploy. Without before-hook-creation, a failed Job blocks the next upgrade because Kubernetes refuses to create a Job with the same name. The Release.Revision suffix makes each Job name unique per revision, giving you a fresh object every time while the delete policy cleans up successes automatically.

Step 6 — Install and Verify All Environments

Use helm template locally first — zero cluster access needed — to verify that each environment file renders exactly what you expect. This is a mandatory step before any CI pipeline ships a chart:

# Resolve the Redis subchart dependency helm dependency update . # Dry-run rendering for each environment (no cluster needed) helm template taskflow-dev . \ --debug 2>&1 | head -100 helm template taskflow-staging . \ --values deploy/values-staging.yaml \ --debug 2>&1 | grep "replicas:\|redis.enabled\|externalRedis" helm template taskflow-prod . \ --values deploy/values-prod.yaml \ --debug 2>&1 | grep -E "minAvailable|maxReplicas|cpu:|memory:" # Lint the chart (mandatory in CI) helm lint . --values deploy/values-staging.yaml helm lint . --values deploy/values-prod.yaml # Install dev (uses embedded Redis subchart) helm upgrade --install taskflow-dev . \ --namespace taskflow-dev \ --create-namespace \ --wait --atomic --timeout 5m0s # Deploy staging (external Redis, TLS, debug logging) helm upgrade --install taskflow-staging . \ --namespace taskflow-staging \ --create-namespace \ --values deploy/values-staging.yaml \ --set image.tag=${IMAGE_TAG} \ --wait --atomic --timeout 5m0s # Deploy prod (HPA, PDB, production Redis, IRSA annotation) helm upgrade --install taskflow-prod . \ --namespace taskflow-prod \ --create-namespace \ --values deploy/values-prod.yaml \ --set image.tag=${IMAGE_TAG} \ --wait --atomic --timeout 10m0s
Diff before you deploy: Install the helm-diff plugin (helm plugin install https://github.com/databus23/helm-diff) and run helm diff upgrade taskflow-prod . --values deploy/values-prod.yaml before every production upgrade. It prints a colour-coded diff of what will change in the cluster — an invaluable safety check that is standard practice on every production deployment at companies like Datadog, Stripe, and GitHub.

What You Have Built

The final chart structure:

taskflow-api/ ├── Chart.yaml # metadata + Redis dependency ├── values.yaml # master defaults (dev-safe) ├── deploy/ │ ├── values-staging.yaml # staging overrides │ └── values-prod.yaml # prod overrides (HPA, PDB, IRSA) ├── templates/ │ ├── _helpers.tpl # naming + redisUrl helpers │ ├── deployment.yaml # readiness probes, checksum annotation │ ├── service.yaml # ClusterIP │ ├── ingress.yaml # conditionally enabled │ ├── configmap.yaml # NODE_ENV, LOG_LEVEL, REDIS_URL │ ├── serviceaccount.yaml # IRSA-annotatable │ ├── pdb.yaml # conditional, minAvailable │ ├── hpa.yaml # conditional autoscaling/v2 │ └── hooks/ │ └── migrate.yaml # pre-install/pre-upgrade Job └── charts/ └── redis-19.x.x.tgz # vendored subchart (dev only)

This chart encodes six months of hard-won production knowledge into a single deployable unit: it prevents configuration drift across environments, enforces security contexts by default, handles zero-downtime deploys via PDB and rolling update strategy, runs migrations atomically before traffic shifts, and gives you a full audit trail via Helm release history. Every pattern here — the checksum annotation, the conditional PDB, the IRSA annotation path, the per-environment values files — mirrors exactly how mature platform teams package applications at big-tech scale.