Hotfixes & Backports
Hotfixes & Backports
Production is broken. A critical vulnerability is being exploited. Customers are losing money with every passing minute. This is the moment when your release process either proves its maturity — or collapses under pressure. Hotfixes and backports are the disciplines that let you patch released versions safely, without disrupting the normal train of development and without creating the chaos of "just push it fast."
Big-tech engineering distinguishes sharply between two operations: a hotfix (an emergency patch applied directly to a released version) and a backport (carrying a fix that already landed on main back to one or more older support branches). They share mechanics but differ in urgency, governance, and risk profile.
Support Branches: The Foundation
You can only patch a released version if that version still has a living branch. This is why every release train cuts a named branch — release/1.5, release/2024.10 — and keeps it alive for as long as the version is supported. The branch is the patch surface. Delete it and you lose the ability to issue a hotfix without rebuilding history.
Establish a clear support window policy before you ship anything: Google Chrome supports only the current version; Kubernetes supports the three most recent minor releases (N, N-1, N-2); Ubuntu LTS supports two versions for five years each. Whatever you choose, publish it, automate branch lifecycle, and enforce it — the worst outcome is engineers assuming a branch is supported when it was silently abandoned.
The Hotfix Workflow
A hotfix is written when the bug is first discovered in production and there is no corresponding fix on main yet — or when the fix on main cannot be safely ported because main has diverged too far. The canonical workflow:
- Branch from the release tag, not from
main. This is critical:mainmay have weeks of unreleased features that must not go to production. - Apply the minimal fix. Hotfixes are not the time for refactoring, dependency upgrades, or "while we are in here" changes. One commit, one purpose.
- Run the release pipeline in accelerated mode. All tests must pass — skip nothing. If your test suite takes 45 minutes and customers are down, that is a test-suite performance problem to fix separately, not a reason to skip tests now.
- Tag with a patch version bump (
v1.4.0→v1.4.1) and release. - Merge the fix back to
mainimmediately after release. This step is the most commonly skipped, and skipping it causes the same bug to reappear in the next release.
main is the single most common hotfix failure mode. The fix ships, the incident closes, everyone goes home — and three months later the identical bug ships again in the next release because the fix lived only on the release branch. Automate a post-release CI step that opens a PR from the hotfix branch into main so the team cannot forget.
Backports: Carrying Fixes to Older Branches
A backport differs from a hotfix in one important way: the fix already exists on main. Your job is to carry it — as cleanly as possible — to an older release branch. The tool is git cherry-pick.
Cherry-pick copies the diff of a specific commit and replays it. It creates a new commit with a new SHA on the target branch — the history is separate from main. This is intentional: the release branch should not inherit unrelated commits from main.
main was a sprawling 40-file refactor, ask the author to extract the pure bug-fix into a separate commit before you backport. A clean, isolated fix cherry-picks with zero conflicts; a refactor cherry-pick is a multi-day debugging exercise.
Automating Backports with Labels
At scale — multiple support branches, dozens of engineers — manual backporting is error-prone and slow. The industry-standard approach is label-driven automation: a PR author labels their fix PR with backport/1.3 and backport/1.4; a bot automatically opens cherry-pick PRs to those branches after the original merges.
GitHub Actions with the zeebe-io/backport-action or Prow's /cherrypick command (used by Kubernetes) are common implementations. The pattern is identical: label triggers automation, automation opens a PR, a human reviews and merges. The human review step is not optional — automated cherry-picks can conflict, and a bot merge of a broken backport can take down a production system.
Governance: Who Can Approve a Hotfix?
Release branches must be protected branches with stricter merge controls than main. A hotfix that bypasses code review is not a hotfix — it is an unauthorized production change, and it is how security patches introduce new vulnerabilities. Production-grade governance looks like this:
- Required reviewers: At minimum two engineers, including the release manager on call. The team lead or a senior engineer must be one approver.
- All CI checks must pass: Unit tests, integration tests, vulnerability scans. No exceptions, even at 3 AM.
- Commit signing required: Use
git commit -Swith a GPG or SSH key. A signed tag on the hotfix release proves the artifact was built from an audited commit. - Release notes required: Every patch release needs a CHANGELOG entry written before the tag is pushed. The on-call team should know exactly what changed before they deploy.
Hotfix Release Pipeline Design
Your standard release pipeline is optimized for throughput. Your hotfix pipeline must be optimized for speed without sacrificing safety. The practical differences:
- Parallelism: Run unit tests, integration tests, and security scans in parallel rather than sequentially. A 45-minute sequential suite becomes a 15-minute parallel run with no quality drop.
- Dedicated runners: Hotfix jobs should run on pre-warmed, reserved CI runners — not queued behind 200 regular PR builds. Reserve capacity at the infrastructure level.
- Skip non-safety gates: It is acceptable to skip performance benchmarks, code coverage delta checks, and linting. It is never acceptable to skip correctness tests or security scans.
- Rollback-first deploy: Before deploying the hotfix, ensure your rollback artifact (the previous version) is pre-staged in every region. Deploy the hotfix only when you can roll back in under two minutes if it makes things worse.
Mastering hotfixes and backports is what separates teams that recover gracefully from incidents from teams that compound them. The mechanics — branch from a tag, minimal fix, merge back — are simple. The discipline — governance, automation, pre-staged rollbacks, clear support windows — is what takes practice to embed into organizational culture.