Monorepos vs Polyrepos
Monorepos vs Polyrepos
How you organize code across repositories shapes your entire engineering culture — how fast teams ship, how dependencies drift, and how painful cross-cutting changes become. This lesson dissects the monorepo vs polyrepo decision at production scale, examines the real tooling that makes each viable, and explains what Google, Meta, Microsoft, and Airbnb actually do and why.
The Core Definitions
A monorepo stores many projects — services, libraries, infrastructure code, tooling — in one Git repository under a shared version graph. A polyrepo gives each project its own repository with its own branching, CI, and release cadence. A third hybrid — the meta-repo pattern — uses a thin umbrella repo that composes polyrepos via Git submodules or tools like meta — but it inherits the worst of both worlds and is rarely the right answer.
Why Giants Chose Monorepos
Google operates the world's largest monorepo: a single internal repository called "google3" with over 86 TB of data and 2 billion lines of code, serving tens of thousands of engineers. Facebook's fbsource monorepo is similarly sized. Both companies built custom tooling (Bazel and Buck, respectively) because no off-the-shelf VCS could scale to this. Their core rationale:
- Atomic cross-cutting changes — rename an API, update every caller in one commit. No versioned shims, no "breaking change" coordination across repos.
- Single source of truth for dependency versions — no diamond dependency hell across teams.
- Shared tooling investment amortized — one linter config, one CI template, one security scanner policy.
- Easier code reuse and discovery — engineers can read and contribute to any library without context-switching repositories.
Where Polyrepos Win
Not every organization is Google. Polyrepos are the right default when:
- Strong team autonomy matters more than coordination — independent release trains, different tech stacks, separate on-call rotations.
- External contributors or open-source work — giving a vendor access to one service repo without exposing everything else.
- Compliance boundaries — PCI-scoped code, HIPAA data-handling services, or secret-containing infrastructure can be isolated with narrower access controls.
- Early-stage scale — before you have the tooling investment to make monorepos fast, polyrepos have less operational friction.
The Real Monorepo Problem: Git Does Not Scale
Git was designed for Linux kernel development — single project, hundreds of contributors. At monorepo scale, git status on a 10-million-file tree takes minutes, git clone is impractical, and git log --all becomes noise. Two Git mechanisms address this directly.
Sparse Checkout (Cone Mode)
Sparse checkout lets a developer check out only the subdirectories they care about. As of Git 2.25, cone mode is the production-recommended approach — it restricts paths to a set of directories using simple pattern matching that is orders of magnitude faster than the old wildcard approach.
--filter=blob:none (partial clone — omits file blobs until needed) with --depth=1 (shallow clone) for the fastest possible initial checkout. Engineers at Shopify and Twitter have reported reducing clone time from 45 minutes to under 2 minutes on large monorepos with these two flags together.
Build-System-Level Isolation: Bazel and Nx
The deeper monorepo problem is not Git — it is CI. If every commit triggers a full rebuild, a monorepo with 500 packages becomes unusable. The answer is a build system that understands the dependency graph and rebuilds and retests only affected packages.
Bazel (Google's open-source version of internal Blaze) models every build target explicitly. A BUILD file declares inputs, outputs, and dependencies. Bazel hashes inputs and caches outputs remotely — if nothing upstream changed, the test result is served from cache in milliseconds.
For JavaScript/TypeScript monorepos, Nx and Turborepo fill the same role. Nx builds a project graph from package.json dependencies and tsconfig paths; Turborepo uses a pipeline defined in turbo.json. Both support remote caching via hosted backends (Nx Cloud, Vercel).
What Giants Actually Do
- Google — monorepo (google3), Bazel, Piper VCS (proprietary), Critique code review. Virtually all code is in one repo.
- Meta — monorepo (fbsource), Sapling VCS (open-sourced 2022), Buck2 build system. Hg/Sapling chosen over Git because Git's object model cannot handle their scale.
- Microsoft — the Windows repo uses GVFS (Git Virtual File System, now VFSforGit) to virtualize the file system so Git only downloads files on access. They open-sourced this as a practical alternative to full sparse checkout.
- Airbnb / Lyft — polyrepos organized by domain team; they invest heavily in platform tooling to keep dependency versions synchronized across repos (internal "dep-bot" automation).
- Shopify — migrated Rails monolith to a component-based monorepo using
packwerkto enforce boundaries without splitting repos.
git filter-repo or git subtree) are required to merge histories, and CI pipelines must be rebuilt from scratch. Do the architectural decision early, before you have 40 repos and 200 engineers.
Decision Framework
Use this mental model when choosing:
- How often do your services share code? High shared code + frequent cross-cutting changes = monorepo advantage grows.
- Do you have the platform team to build monorepo tooling? Without Bazel/Nx/Turborepo and a remote cache, a monorepo becomes a CI bottleneck within months.
- Are access control or compliance boundaries hard requirements? If yes, polyrepo is often the easier path.
- What is your current scale? Under ~10 services and ~20 engineers, either approach works; optimize for developer experience, not architecture purity.
platform-monorepo, data-monorepo). This captures most of the shared-tooling benefit while keeping access control manageable and Git performance reasonable without specialized VCS tooling.
Practical Migration Patterns
If you need to merge an existing polyrepo into a monorepo, git subtree is the standard approach for preserving history:
Monorepos and polyrepos are not moral positions — they are engineering trade-offs. The right answer depends on your team size, tooling maturity, compliance requirements, and how often your services actually share code. What matters most is that the choice is intentional and that you invest in the tooling that makes your chosen approach fast and maintainable at scale.