Artifact Management & Release Engineering

Hotfixes & Backports

18 min Lesson 9 of 28

Hotfixes & Backports

Production is broken. A critical vulnerability is being exploited. Customers are losing money with every passing minute. This is the moment when your release process either proves its maturity — or collapses under pressure. Hotfixes and backports are the disciplines that let you patch released versions safely, without disrupting the normal train of development and without creating the chaos of "just push it fast."

Big-tech engineering distinguishes sharply between two operations: a hotfix (an emergency patch applied directly to a released version) and a backport (carrying a fix that already landed on main back to one or more older support branches). They share mechanics but differ in urgency, governance, and risk profile.

Support Branches: The Foundation

You can only patch a released version if that version still has a living branch. This is why every release train cuts a named branch — release/1.5, release/2024.10 — and keeps it alive for as long as the version is supported. The branch is the patch surface. Delete it and you lose the ability to issue a hotfix without rebuilding history.

Establish a clear support window policy before you ship anything: Google Chrome supports only the current version; Kubernetes supports the three most recent minor releases (N, N-1, N-2); Ubuntu LTS supports two versions for five years each. Whatever you choose, publish it, automate branch lifecycle, and enforce it — the worst outcome is engineers assuming a branch is supported when it was silently abandoned.

Hotfix vs. backport flows: a hotfix originates on the release branch and merges back to main; a backport cherry-picks a fix from main onto a support branch.

The Hotfix Workflow

A hotfix is written when the bug is first discovered in production and there is no corresponding fix on main yet — or when the fix on main cannot be safely ported because main has diverged too far. The canonical workflow:

Branch from the release tag, not from main. This is critical: main may have weeks of unreleased features that must not go to production.
Apply the minimal fix. Hotfixes are not the time for refactoring, dependency upgrades, or "while we are in here" changes. One commit, one purpose.
Run the release pipeline in accelerated mode. All tests must pass — skip nothing. If your test suite takes 45 minutes and customers are down, that is a test-suite performance problem to fix separately, not a reason to skip tests now.
Tag with a patch version bump (v1.4.0 → v1.4.1) and release.
Merge the fix back to main immediately after release. This step is the most commonly skipped, and skipping it causes the same bug to reappear in the next release.

# Hotfix workflow — start from the released tag, NOT from main
git fetch --tags
git checkout v1.4.0
git checkout -b hotfix/CVE-2025-1234

# Apply the minimal fix, then commit with a clear scope
git add src/auth/token.go
git commit -m "fix(auth): reject tokens with empty sub claim (CVE-2025-1234)"

# Run full test suite — never skip
make test-all

# Tag the hotfix release (annotated tag with release notes)
git tag -a v1.4.1 -m "chore(release): v1.4.1 - security patch CVE-2025-1234"
git push origin hotfix/CVE-2025-1234
git push origin v1.4.1

# CRITICAL: After CI releases the artifact, merge the fix back to main
git checkout main
git cherry-pick <hotfix-commit-sha>
git push origin main

The merge-back trap: Skipping the merge-back to main is the single most common hotfix failure mode. The fix ships, the incident closes, everyone goes home — and three months later the identical bug ships again in the next release because the fix lived only on the release branch. Automate a post-release CI step that opens a PR from the hotfix branch into main so the team cannot forget.

Backports: Carrying Fixes to Older Branches

A backport differs from a hotfix in one important way: the fix already exists on main. Your job is to carry it — as cleanly as possible — to an older release branch. The tool is git cherry-pick.

Cherry-pick copies the diff of a specific commit and replays it. It creates a new commit with a new SHA on the target branch — the history is separate from main. This is intentional: the release branch should not inherit unrelated commits from main.

# Backport a fix from main to release/1.3
# Find the commit SHA on main
git log main --oneline --grep="fix(sql): escape user input"
# e.g. output: a7f3c21 fix(sql): escape user input in search handler

# Switch to the support branch
git checkout release/1.3
git pull origin release/1.3

# Cherry-pick the fix
git cherry-pick a7f3c21

# If there are conflicts (common when branches have diverged), resolve manually
# git cherry-pick --continue  (after resolving)
# git cherry-pick --abort     (to bail out)

# Push and open a PR for review — even hotfixes need eyes
git push origin release/1.3

# Tag the patch release
git tag -a v1.3.1 -m "chore(release): v1.3.1 - backport SQL escape fix"
git push origin v1.3.1

Conflict-minimizing strategy: Always cherry-pick the smallest possible unit — prefer squashed, atomic commits over picking a merge commit. If the original fix on main was a sprawling 40-file refactor, ask the author to extract the pure bug-fix into a separate commit before you backport. A clean, isolated fix cherry-picks with zero conflicts; a refactor cherry-pick is a multi-day debugging exercise.

Automating Backports with Labels

At scale — multiple support branches, dozens of engineers — manual backporting is error-prone and slow. The industry-standard approach is label-driven automation: a PR author labels their fix PR with backport/1.3 and backport/1.4; a bot automatically opens cherry-pick PRs to those branches after the original merges.

GitHub Actions with the zeebe-io/backport-action or Prow's /cherrypick command (used by Kubernetes) are common implementations. The pattern is identical: label triggers automation, automation opens a PR, a human reviews and merges. The human review step is not optional — automated cherry-picks can conflict, and a bot merge of a broken backport can take down a production system.

# .github/workflows/backport.yml
# Triggered when a PR is closed (merged) and carries a backport label

name: Backport

on:
  pull_request:
    types: [closed]

jobs:
  backport:
    if: github.event.pull_request.merged == true
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Run backport
        uses: zeebe-io/backport-action@v3
        with:
          github_token: ${{ secrets.GITHUB_TOKEN }}
          # Label format: backport/release-1.4
          # Bot opens a PR titled "[Backport release/1.4] Original PR title"
          pull_request_description: |
            Automated backport of #${{ github.event.pull_request.number }}.
            Review carefully — cherry-pick conflicts may have been resolved automatically.

Governance: Who Can Approve a Hotfix?

Release branches must be protected branches with stricter merge controls than main. A hotfix that bypasses code review is not a hotfix — it is an unauthorized production change, and it is how security patches introduce new vulnerabilities. Production-grade governance looks like this:

Required reviewers: At minimum two engineers, including the release manager on call. The team lead or a senior engineer must be one approver.
All CI checks must pass: Unit tests, integration tests, vulnerability scans. No exceptions, even at 3 AM.
Commit signing required: Use git commit -S with a GPG or SSH key. A signed tag on the hotfix release proves the artifact was built from an audited commit.
Release notes required: Every patch release needs a CHANGELOG entry written before the tag is pushed. The on-call team should know exactly what changed before they deploy.

The "break glass" process: Even with strict branch protection, every team needs a documented, audited override path for true zero-day emergencies — when tests are flaky and every second costs money. Define this path in advance: who can approve a break-glass deploy, what compensating controls apply (manual smoke test checklist, rollback plan written before deploy, incident ticket number attached), and how the full post-mortem will review the decision. A break-glass used without this process is just negligence with extra steps.

Hotfix Release Pipeline Design

Your standard release pipeline is optimized for throughput. Your hotfix pipeline must be optimized for speed without sacrificing safety. The practical differences:

Parallelism: Run unit tests, integration tests, and security scans in parallel rather than sequentially. A 45-minute sequential suite becomes a 15-minute parallel run with no quality drop.
Dedicated runners: Hotfix jobs should run on pre-warmed, reserved CI runners — not queued behind 200 regular PR builds. Reserve capacity at the infrastructure level.
Skip non-safety gates: It is acceptable to skip performance benchmarks, code coverage delta checks, and linting. It is never acceptable to skip correctness tests or security scans.
Rollback-first deploy: Before deploying the hotfix, ensure your rollback artifact (the previous version) is pre-staged in every region. Deploy the hotfix only when you can roll back in under two minutes if it makes things worse.

Mastering hotfixes and backports is what separates teams that recover gracefully from incidents from teams that compound them. The mechanics — branch from a tag, minimal fix, merge back — are simple. The discipline — governance, automation, pre-staged rollbacks, clear support windows — is what takes practice to embed into organizational culture.