Git & Collaboration Workflows

Trunk-Based Development & Feature Flags

18 min Lesson 6 of 28

Trunk-Based Development & Feature Flags

The previous lesson compared branching strategies side-by-side. This lesson dives deep into one of them — Trunk-Based Development (TBD) — and pairs it with the technique that makes it safe at scale: feature flags. These two practices together are how Google, Meta, and Netflix ship dozens of changes to production every day without feature branches that live for weeks.

What Trunk-Based Development Actually Means

In TBD, every engineer integrates code into a single shared branch — usually called main or trunk — at least once per day. There are no long-lived feature branches. A branch, if it exists at all, lives for hours, not days. The moment it is ready (even partially), it merges to trunk.

This sounds reckless. It is the opposite. The discipline of integrating frequently forces three things:

Conflicts surface immediately. When you wait two weeks to merge, you fight a multi-thousand-line merge conflict. When you merge daily, conflicts are trivial.
CI runs on real integrated code. A test suite that only runs on your branch is a test suite that does not catch integration bugs.
The team shares one truth. There is no "we will merge after the release" drift. Everyone is always looking at the same codebase.

The DORA research finding: Elite engineering teams (those deploying multiple times per day with low failure rates) almost universally practice trunk-based development. The 2023 DORA State of DevOps report found TBD is one of the top predictors of software delivery performance — more than team size, language, or cloud provider.

Short-Lived Branches: The Allowed Exception

Pure TBD allows zero branches — every commit goes directly to main. In practice, most teams use a mild variant: short-lived feature branches that exist for at most one or two days and go through a pull request before merging. This is sometimes called "scaled trunk-based development."

The rules that make short-lived branches safe:

Branch from the latest main commit — never from another branch.
Merge (or rebase) from main into your branch at least daily if the branch lives longer than 24 hours.
Keep the PR small enough to review in under 30 minutes. If a feature is too large, break it into incremental, independently-mergeable slices — each behind a feature flag.
Delete the branch immediately after merge. No lingering branches on origin.

# Branch created, worked on, and merged within a day
git switch -c feat/add-rate-limiter main
# ... implement, commit ...
git fetch origin main
git rebase origin/main             # Stay current with trunk
git push -u origin feat/add-rate-limiter
# PR opened → reviewed → approved → merged → branch deleted
git push origin --delete feat/add-rate-limiter

Auto-delete branches on merge. In GitHub, enable "Automatically delete head branches" in repository settings. In GitLab, check "Delete source branch" when creating merge requests. This enforces the discipline at the platform level — engineers cannot accidentally leave stale branches open.

The Core Problem TBD Exposes

If you cannot have long-lived branches, how do you ship a feature that takes two weeks to build? You cannot hide it in a branch. You cannot refuse to integrate until it is done. The answer is: you integrate constantly, but you control when the code runs. That is the job of feature flags.

Feature Flags: Decoupling Deploy from Release

A feature flag (also called a feature toggle or feature gate) is a conditional in code that decides at runtime whether a feature is active. The code is deployed to production continuously; the flag controls who sees it and when.

This is the key mental model: deploy is a technical event (code goes to servers); release is a business event (users see the feature). TBD + flags decouple them completely. You can deploy 50 times a day while releasing a feature to 1% of users on a Tuesday at 10 AM when your support team is ready.

Deploys happen continuously; the feature flag controls when and to whom the feature is released.

Flag Types You Need to Know

Not all flags are the same. Using the wrong type in the wrong context causes technical debt and operational risk:

Release flags — short-lived; hide in-progress features. Delete them the moment the rollout completes. These are the TBD workhorse.
Experiment flags (A/B) — route users to variant A or B; measured by analytics. Live as long as the experiment, then cleaned up.
Ops flags — circuit breakers and kill switches. "Disable the recommendations engine if the ML service is overloaded." Long-lived and treated as runbook items.
Permission flags — gate features by user tier, geography, or plan. Can be permanent business logic.

Flag debt kills codebases. A flag that is never cleaned up after rollout is a permanent branch in your logic. Google has internal tooling that automatically files bugs when a flag is older than 90 days and the code is fully rolled out. Treat flag cleanup as a first-class engineering task, not an afterthought.

Implementing Flags: From Simple to Production-Grade

At the simplest level, a flag is just an environment variable check. This works for ops flags and early-stage releases:

# .env or environment config
FEATURE_NEW_CHECKOUT=true

# Python example
import os

def render_checkout(user):
    if os.getenv("FEATURE_NEW_CHECKOUT") == "true":
        return new_checkout_flow(user)
    return legacy_checkout_flow(user)

For per-user targeting — the kind needed for A/B tests or gradual rollouts — you need a flag evaluation service. The industry standard open-source option is OpenFeature (a CNCF project) paired with a backend like Flagd or a hosted service like LaunchDarkly. Here is a realistic flagd configuration:

# flagd/flags.json — evaluated server-side, hot-reloaded
{
  "flags": {
    "new-checkout-ui": {
      "state": "ENABLED",
      "variants": {
        "on": true,
        "off": false
      },
      "defaultVariant": "off",
      "targeting": {
        "if": [
          {
            "in": [
              { "var": "email" },
              ["alice@example.com", "beta@example.com"]
            ]
          },
          "on",
          {
            "fractionalEvaluation": [
              { "cat": ["new-checkout-ui", { "var": "userId" }] },
              ["on", 10],
              ["off", 90]
            ]
          }
        ]
      }
    }
  }
}

This configuration serves the new checkout UI to two explicit beta users, and to a random 10% of everyone else — with the random split sticky (same user always gets the same variant because the hash key includes the flag name). The file is hot-reloaded; no restart needed to change the rollout percentage.

The Strangler Fig Pattern with Flags

Large refactors — replacing a payment processor, rewriting an auth service — are done safely with the strangler fig pattern: run both old and new code paths in parallel, route a growing percentage of traffic to the new path via a flag, and decommission the old path only when the new one has handled 100% of traffic for a sustained period. The flag is your rollback lever at every step.

The flag gradually shifts traffic from the legacy service to the new one, enabling instant rollback at any percentage.

Common Production Failure Modes

TBD and flags fail in predictable ways. Knowing them lets you avoid them:

Committing directly to trunk without CI passing. TBD requires a fast, reliable CI pipeline as its immune system. If CI is flaky or slow (> 10 minutes), engineers bypass it. Invest in test speed first.
Flags in the wrong layer. A flag that is evaluated in three different services with inconsistent results is worse than no flag. Evaluate flags once (server-side, in the API gateway or a shared SDK) and pass the result downstream.
No observability on flag state. If an incident occurs, you need to know immediately which flags were active for affected users. Emit the flag evaluation result as a structured log field or trace attribute on every request.
Flag explosion. 500 active flags across a codebase with no ownership records. Assign an owner and an expiry date to every flag at creation time. Automate reminders.

Test both flag states in CI. Your test suite must pass with the flag both on and off. A common pattern is to parameterize tests with both values, or to run the full suite twice in the pipeline — once with the flag forced on, once forced off. This catches the case where the flag-off path silently breaks because nobody tested it.

Putting It Together: A Day in the Life

A senior engineer at a TBD shop working on a new payments feature does this: branches from main, writes a thin first slice (the database migration and the flag-guarded route), opens a PR the same day, gets it merged. Flag is off. Tomorrow: branches again from latest main, adds the UI layer behind the same flag, merged by end of day. Flag still off. Day three: backend logic complete, merged, end-to-end tests added. Flag turned on for the QA environment. Day four: flag turned on for 5% of production traffic. Metrics look good. Day five: 100%. Day six: PR opened to delete the flag and the legacy code path. Merged. Done. The feature never lived in a branch for more than 24 hours, yet it took five days to build safely.

This is the discipline that enables teams to deploy fearlessly. The branch is short; the risk is managed by the flag. That is the entire idea.