DevOps Culture & Fundamentals

How Big Tech Ships Software

18 min Lesson 4 of 28

How Big Tech Ships Software

Amazon deploys to production every 11.6 seconds on average. Google runs thousands of deployments per day across its services. Netflix ships hundreds of times a week to a platform serving 250 million subscribers. These numbers are not marketing — they are the direct result of deliberate engineering culture and architecture decisions that you can adopt at any scale.

This lesson pulls back the curtain on those practices: why elite teams deploy so frequently, what trunk-based development actually looks like in a real codebase, how small-batch thinking cuts risk rather than increasing it, and where the DORA metrics fit as the measurement layer on top of everything else.

Why Deploy Frequency Is a Leading Indicator of Quality

A counterintuitive truth underpins all of modern DevOps: deploying more often makes each deployment safer, not riskier. Here is the mechanics:

  • Smaller changesets — a deployment touching 50 lines is far easier to reason about, roll back, and post-mortem than one touching 5,000 lines.
  • Tighter feedback loops — bugs surfaced in production minutes after a commit are trivially localised. Bugs discovered three weeks later can be almost any line in a 400-commit diff.
  • Reduced blast radius — when something does go wrong, the scope of the outage is bounded by the scope of the change.
  • Psychological safety — teams that ship daily stop treating releases as high-stakes events. Deploys become routine, reducing the fear that slows organisations down.
The 2023 DORA State of DevOps Report found that elite performers deploy on-demand (multiple times per day) and have a change failure rate of under 5% — lower than low-performing teams that deploy monthly. Frequency and stability are positively correlated, not at odds.

Trunk-Based Development

The deployment frequency of elite teams is only possible because of a specific branching strategy: trunk-based development (TBD). Every engineer commits directly to a single shared branch — typically called main or trunk — at least once per day. There are no long-lived feature branches.

This sounds alarming at first. Won't everyone break each other's work? The answer is no — because TBD pairs with two complementary practices:

  1. Feature flags — code is merged to trunk behind a flag that is off in production. The feature is decoupled from its release. You can merge dark code continuously and flip the flag when the product is ready.
  2. Comprehensive automated tests — a fast, reliable test suite (unit + integration, running in under 10 minutes) is the non-negotiable prerequisite. It is the safety net that lets the trunk stay always-deployable.

Compare the two workflows below. The Git log on the left is a real pattern from a 6-month feature branch; the one on the right is trunk-based:

Feature-branch vs trunk-based development comparison Long-Lived Feature Branch main feature/user-auth (6 weeks) Merge conflict hell Integration risk grows every day Trunk-Based Development main 1 day 1 day 1 day Feature flags decouple merge from release Always-deployable trunk, low integration risk
Left: long-lived feature branches accumulate integration debt. Right: trunk-based development with short-lived branches merged daily keeps the trunk always deployable.

Small Batches: The Lean Manufacturing Connection

The principle of small batches comes directly from Toyota's manufacturing system. In a factory, if you stamp 1,000 parts before discovering the die is miscalibrated, you scrap 1,000 parts. If you stamp 1 and inspect, you scrap 1. Software has the same economics — but teams routinely batch weeks of work before getting production feedback.

Operationally, small batches in software mean:

  • Stories broken into tasks that can each be merged in a single working day.
  • Database migrations deployed independently of application code (the strangler-fig pattern for schema changes).
  • API changes versioned so the new consumer and old consumer coexist during rollout.
  • Infrastructure changes applied incrementally via IaC, not as one big terraform apply sweeping dozens of resources.
At Google, the guideline is that a change list (CL — their term for a PR) should be small enough that the reviewer can hold the entire diff in their head at once. If a reviewer needs to page-swap mentally, the CL is too large. A good target is under 200 lines changed; a CL over 400 lines should be questioned.

The DORA Metrics as a Compass

The DORA four key metrics (Deployment Frequency, Lead Time for Changes, Change Failure Rate, and Mean Time to Restore) are covered in depth in Lesson 5. Here they appear as the measurement layer for the practices above — they answer are we actually improving?

A team moving toward trunk-based development and smaller batches should see these metric trajectories over 3–6 months:

  • Deployment Frequency ↑ — more deploys per day/week as batch size shrinks.
  • Lead Time for Changes ↓ — code reaches production in hours rather than weeks because it is not sitting in a branch queue.
  • Change Failure Rate ↓ — smaller changes are easier to test and review, so fewer defects escape.
  • MTTR ↓ — when something breaks, the small blast radius makes the fix obvious and fast.

What a Production Pipeline Looks Like at Elite Scale

At companies like Shopify, GitHub, and Netflix, every commit to trunk triggers an automated pipeline that runs in parallel to keep the feedback loop under 10 minutes:

# Simplified GitHub Actions pipeline — triggered on every push to main name: CI/CD Pipeline on: push: branches: [main] jobs: test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Run unit tests run: make test-unit - name: Run integration tests run: make test-integration - name: Static analysis run: make lint build: needs: test runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Build & push container image run: | docker build -t registry.example.com/app:${{ github.sha }} . docker push registry.example.com/app:${{ github.sha }} deploy-canary: needs: build runs-on: ubuntu-latest steps: - name: Deploy to 5% of traffic (canary) run: | kubectl set image deployment/app \ app=registry.example.com/app:${{ github.sha }} kubectl annotate deployment/app \ traffic-weight=5 promote-to-stable: needs: deploy-canary runs-on: ubuntu-latest steps: - name: Wait for canary health check run: ./scripts/wait-for-canary.sh --timeout 300 - name: Promote to 100% traffic run: kubectl annotate deployment/app traffic-weight=100

The canary step is the key production safety mechanism. Rather than flipping all traffic at once, the pipeline routes 5% of production requests to the new version. Automated health checks (error rate, latency p99, saturation) run for five minutes. If they pass, the rollout completes. If they fail, the deployment is automatically rolled back — and the pipeline posts a Slack alert with the failing metric.

The hidden prerequisite: none of this works without observability — structured logs, metrics, and traces — already in place. A canary step that cannot query error rates is just waiting blindly. Build your telemetry stack before you build your deployment automation. Observability is covered in later tutorials; file it mentally as a blocker for elite-level deploy frequency.

Feature Flags in Practice

Feature flags are the glue between trunk-based development and product release management. The simplest implementation is an environment variable; production systems use a flag service (LaunchDarkly, Unleash, Flipt, or a homegrown Redis hash) so flags can be toggled without a redeploy.

# Pseudocode: wrapping a new payment flow behind a flag # The flag is evaluated at runtime — zero redeployment to enable/disable if feature_enabled('new_checkout_flow', user_id=current_user.id): render_new_checkout() else: render_legacy_checkout() # Flag configuration in Unleash / LaunchDarkly: # { # "flag": "new_checkout_flow", # "rollout": "percentage", # "percentage": 10, # 10% of users see the new flow # "targeting": { # "segment": "beta_users" # } # }

This decoupling is why Google can merge hundreds of CLs into a single binary that goes to production — each experimental feature is dark until a product manager flips a flag, often with gradual percentage rollout starting at 1%.

Dark launching is a Google-pioneered variant: new back-end code runs in shadow mode, receiving a copy of real production traffic, but its response is discarded. Engineers observe latency and error rates before any user ever sees the feature. Gmail and Google Maps were dark-launched at scale before public release.

Key Takeaways

  • Elite teams deploy frequently because small changes are safer, not despite the risk.
  • Trunk-based development is the branching strategy that makes high-frequency deployments possible — long-lived branches are an anti-pattern at scale.
  • Small batches reduce integration cost, shorten feedback loops, and limit blast radius.
  • Feature flags decouple code merge from feature release, enabling continuous delivery without continuous rollout.
  • DORA metrics are the quantitative lens on all of these practices — track them to confirm that process changes translate into real delivery improvements.