Git & Collaboration Workflows

Monorepos vs Polyrepos

18 min Lesson 9 of 28

Monorepos vs Polyrepos

How you organize code across repositories shapes your entire engineering culture — how fast teams ship, how dependencies drift, and how painful cross-cutting changes become. This lesson dissects the monorepo vs polyrepo decision at production scale, examines the real tooling that makes each viable, and explains what Google, Meta, Microsoft, and Airbnb actually do and why.

The Core Definitions

A monorepo stores many projects — services, libraries, infrastructure code, tooling — in one Git repository under a shared version graph. A polyrepo gives each project its own repository with its own branching, CI, and release cadence. A third hybrid — the meta-repo pattern — uses a thin umbrella repo that composes polyrepos via Git submodules or tools like meta — but it inherits the worst of both worlds and is rarely the right answer.

Why Giants Chose Monorepos

Google operates the world's largest monorepo: a single internal repository called "google3" with over 86 TB of data and 2 billion lines of code, serving tens of thousands of engineers. Facebook's fbsource monorepo is similarly sized. Both companies built custom tooling (Bazel and Buck, respectively) because no off-the-shelf VCS could scale to this. Their core rationale:

Atomic cross-cutting changes — rename an API, update every caller in one commit. No versioned shims, no "breaking change" coordination across repos.
Single source of truth for dependency versions — no diamond dependency hell across teams.
Shared tooling investment amortized — one linter config, one CI template, one security scanner policy.
Easier code reuse and discovery — engineers can read and contribute to any library without context-switching repositories.

Key insight: Monorepos do not mean monolithic deployment. Google's monorepo contains hundreds of independently deployable services. The repo is unified; the runtime is not.

Where Polyrepos Win

Not every organization is Google. Polyrepos are the right default when:

Strong team autonomy matters more than coordination — independent release trains, different tech stacks, separate on-call rotations.
External contributors or open-source work — giving a vendor access to one service repo without exposing everything else.
Compliance boundaries — PCI-scoped code, HIPAA data-handling services, or secret-containing infrastructure can be isolated with narrower access controls.
Early-stage scale — before you have the tooling investment to make monorepos fast, polyrepos have less operational friction.

Monorepo unifies code under one history; polyrepo isolates ownership but creates dependency drift and coordination overhead across repos.

The Real Monorepo Problem: Git Does Not Scale

Git was designed for Linux kernel development — single project, hundreds of contributors. At monorepo scale, git status on a 10-million-file tree takes minutes, git clone is impractical, and git log --all becomes noise. Two Git mechanisms address this directly.

Sparse Checkout (Cone Mode)

Sparse checkout lets a developer check out only the subdirectories they care about. As of Git 2.25, cone mode is the production-recommended approach — it restricts paths to a set of directories using simple pattern matching that is orders of magnitude faster than the old wildcard approach.

# Clone with no working-tree files materialized (object store only)
git clone --filter=blob:none --no-checkout https://github.com/org/monorepo.git
cd monorepo

# Enable sparse checkout in fast cone mode
git sparse-checkout init --cone

# Check out only the directories your team owns
git sparse-checkout set services/payments libs/shared-utils

# Verify what is materialized locally
git sparse-checkout list
# services/payments
# libs/shared-utils

git checkout main

# Later: add a new scope without re-cloning
git sparse-checkout add infra/terraform

Production tip: Combine --filter=blob:none (partial clone — omits file blobs until needed) with --depth=1 (shallow clone) for the fastest possible initial checkout. Engineers at Shopify and Twitter have reported reducing clone time from 45 minutes to under 2 minutes on large monorepos with these two flags together.

Build-System-Level Isolation: Bazel and Nx

The deeper monorepo problem is not Git — it is CI. If every commit triggers a full rebuild, a monorepo with 500 packages becomes unusable. The answer is a build system that understands the dependency graph and rebuilds and retests only affected packages.

Bazel (Google's open-source version of internal Blaze) models every build target explicitly. A BUILD file declares inputs, outputs, and dependencies. Bazel hashes inputs and caches outputs remotely — if nothing upstream changed, the test result is served from cache in milliseconds.

# Example: Bazel BUILD file for a Go service (services/payments/BUILD.bazel)
load("@io_bazel_rules_go//go:def.bzl", "go_binary", "go_test")

go_binary(
    name = "payments",
    srcs = glob(["*.go"]),
    deps = [
        "//libs/shared-utils:utils",
        "@com_github_gin_gonic_gin//:gin",
    ],
    visibility = ["//visibility:public"],
)

go_test(
    name = "payments_test",
    srcs = ["payments_test.go"],
    embed = [":payments"],
)

# Run only what changed (Bazel computes the affected set automatically)
# bazel build //services/payments:payments
# bazel test //...  --build_event_protocol=bep.json

For JavaScript/TypeScript monorepos, Nx and Turborepo fill the same role. Nx builds a project graph from package.json dependencies and tsconfig paths; Turborepo uses a pipeline defined in turbo.json. Both support remote caching via hosted backends (Nx Cloud, Vercel).

# turbo.json — Turborepo pipeline for a JS monorepo
{
  "$schema": "https://turbo.build/schema.json",
  "pipeline": {
    "build": {
      "dependsOn": ["^build"],
      "outputs": ["dist/**", ".next/**"]
    },
    "test": {
      "dependsOn": ["build"],
      "inputs": ["src/**", "tests/**"]
    },
    "lint": {
      "outputs": []
    }
  },
  "remoteCache": {
    "enabled": true
  }
}

# Run the full pipeline — only changed packages are rebuilt
npx turbo run build test --filter=...[origin/main]

What Giants Actually Do

Google — monorepo (google3), Bazel, Piper VCS (proprietary), Critique code review. Virtually all code is in one repo.
Meta — monorepo (fbsource), Sapling VCS (open-sourced 2022), Buck2 build system. Hg/Sapling chosen over Git because Git's object model cannot handle their scale.
Microsoft — the Windows repo uses GVFS (Git Virtual File System, now VFSforGit) to virtualize the file system so Git only downloads files on access. They open-sourced this as a practical alternative to full sparse checkout.
Airbnb / Lyft — polyrepos organized by domain team; they invest heavily in platform tooling to keep dependency versions synchronized across repos (internal "dep-bot" automation).
Shopify — migrated Rails monolith to a component-based monorepo using packwerk to enforce boundaries without splitting repos.

Production pitfall: Migrating from polyrepo to monorepo mid-growth is extremely painful. Git history rewrites (using git filter-repo or git subtree) are required to merge histories, and CI pipelines must be rebuilt from scratch. Do the architectural decision early, before you have 40 repos and 200 engineers.

Decision Framework

Use this mental model when choosing:

How often do your services share code? High shared code + frequent cross-cutting changes = monorepo advantage grows.
Do you have the platform team to build monorepo tooling? Without Bazel/Nx/Turborepo and a remote cache, a monorepo becomes a CI bottleneck within months.
Are access control or compliance boundaries hard requirements? If yes, polyrepo is often the easier path.
What is your current scale? Under ~10 services and ~20 engineers, either approach works; optimize for developer experience, not architecture purity.

Practical middle ground: Many mid-size companies use a domain-bounded monorepo — one monorepo per business domain (e.g., platform-monorepo, data-monorepo). This captures most of the shared-tooling benefit while keeping access control manageable and Git performance reasonable without specialized VCS tooling.

Practical Migration Patterns

If you need to merge an existing polyrepo into a monorepo, git subtree is the standard approach for preserving history:

# Merge repo-B into monorepo under path services/auth (preserving history)
cd monorepo

# Add the source repo as a remote
git remote add auth-origin https://github.com/org/auth-service.git
git fetch auth-origin

# Read the entire auth-service tree into a subdirectory
git read-tree --prefix=services/auth -u auth-origin/main

# Commit the merge
git commit -m "chore: import auth-service into monorepo (history preserved)"

# Going the other way: extract services/payments into its own repo
git subtree split --prefix=services/payments -b split/payments
git push https://github.com/org/payments-service.git split/payments:main

Monorepos and polyrepos are not moral positions — they are engineering trade-offs. The right answer depends on your team size, tooling maturity, compliance requirements, and how often your services actually share code. What matters most is that the choice is intentional and that you invest in the tooling that makes your chosen approach fast and maintainable at scale.