Choose monorepo

Nov 27, 2024

I often think about monorepos and monolithic applications, and the first thing I want to highlight is the distinction between the two. Many engineers conflate these concepts because they sound similar, but they’re actually quite different.

What Is a Monorepo?

Let’s start with the basics. The “repo” in monorepo refers to a Git repository—a single place with a history of commits where people create pull requests off the main branch and commit their changes. A monorepo is the default form of a project: you create a single Git repo and store your code in it.

This concept parallels older versions of source control. Back in the day, we had systems like Perforce and Subversion servers where you’d have a single history of source code changes, branching off and then committing back to the trunk.

Git made repositories cheap and easy to create. Suddenly, every little project or library could have its own repository. GitHub made it easy to host these online, leading to an explosion of repositories. But that still doesn’t fully answer the question: What is a monorepo?

The best way to define it is by contrasting it with its alternative—the polyrepo.

Monorepo vs. Polyrepo

A monorepo is a single repository that holds all the code for multiple projects or services. In contrast, a polyrepo consists of multiple repositories, usually one per project or service.

Many companies started with a single repository culture. As they grew, they began paying for GitHub at the enterprise level and gained the ability to create organizations with numerous repositories. Companies started breaking things out, following open-source patterns, giving each library or special project its own repository.

They were emulating the open-source ecosystem in a closed-source fashion. Suddenly, companies had hundreds or thousands of repositories—a parallel universe to the open-source world. Teams liked this because it allowed them to be rulers of their own domains. Each team could have its own repo with its own rules, tests, and culture. They didn’t have to follow the big company rules.

However, one downside engineers felt with the monorepo was that everyone had to do things the same way. That’s usually good for the company but can be annoying for individual engineers. Polyrepos offered more autonomy and agency from a bottom-up perspective.

The Rise of Microservices and Its Impact

This shift toward polyrepos is closely tied to the rise of microservice architecture, which exploded with the advent of Kubernetes. Companies found it easy to have many different services. People often think of it as one repository per service—that’s how it was implemented in the mid-2010s.

But in practice, you can have thousands of services in a single repository. Google is a great example; they have many services but use a single large repository. Facebook (now Meta) also employs a monorepo approach. However, people often conflated the ideas of microservices and polyrepos, especially as GitOps started coupling deployments, and a single commit would trigger a deployment. The intuitive idea became: one repository, one service, one deployment pipeline, one team.

Benefits of Monorepos

Atomic Changes Across the Codebase

One of the most significant benefits is the ability to make atomic changes across your entire codebase. Suppose you have a front-end and back-end application, and you want to update the API contract between them. In a monorepo, you can update both in a single change. If you need to revert, you can do so atomically as well.

If you have a library used by every service, you can update that library and all the services in one commit. In a polyrepo setup, this becomes a nightmare, requiring updates across multiple repositories, dealing with versioning, and coordinating changes across teams.

Consistency and Simplicity

Having a single repository means consistent settings across the board. This is huge for keeping things tidy and is often required for compliance reasons, like SOC 2 audits. Consistency makes it easier to manage permissions, sensitive data, and overall security.

Leverage on Investment

With a universal build system, any improvements benefit everyone. If someone adds a caching layer to speed up builds, everyone in the monorepo benefits. It’s much harder to achieve this kind of leverage when teams use different build systems across multiple repositories.

Developer Productivity

A monorepo allows for better tooling and infrastructure investments that can significantly boost developer productivity. Internal tools for code analysis, refactoring, and testing can be applied consistently across the entire codebase.

Drawbacks of Monorepos

Merge Conflicts and Merge Skew

With many engineers working in the same repository, merge conflicts can become more frequent. Merge skew refers to the amount of change that happens in the codebase between when you create a pull request and when you merge it. In a large team, a lot can change in a short time, leading to potential conflicts and the need for rebasing or merging updates frequently.

Incidents Affect Everyone

If there’s a critical issue that requires halting deployments, it affects everyone working in the monorepo. This can slow down development across the entire company, whereas in a polyrepo setup, only the affected service’s team might be impacted.

Lack of Autonomy

Engineers might feel constrained by the need to adhere to company-wide standards and practices. They may prefer different languages, frameworks, or tools that aren’t compatible with the monorepo’s established patterns.

How Tech Giants Handle Monorepos

Google’s Approach

Google is renowned for its massive monorepo, which stores the vast majority of its codebase. To handle the scale, Google developed custom tools like Piper, their version control system, and CitC (Client in the Cloud), which provides a virtual file system. This setup allows engineers to work with the codebase efficiently without having to clone the entire repository.

Google also uses Bazel, their open-source build and test tool (internally known as Blaze), which can handle builds at scale. Bazel allows for fast incremental builds and supports multiple languages, making it suitable for a large, diverse codebase.

Meta’s Approach

Meta (formerly Facebook) also uses a monorepo for much of its code. They developed custom extensions to the version control system Mercurial to handle their needs. To address performance issues with large repositories, Meta created EdenFS, a virtual file system that only fetches the files needed for a particular task.

They also built Buck, a build system designed for fast builds and tests, similar to Bazel. Buck allows engineers to work efficiently within the monorepo by managing dependencies and optimizing build processes.

These companies have nearly achieved the platonic ideal of what a monorepo can be. They’ve invested heavily in tooling to address the challenges that come with large-scale monorepos, setting a high bar for what’s possible.

Overcoming the Drawbacks

Many of the drawbacks of monorepos can be mitigated with thoughtful tooling and practices:

Advanced CI/CD Pipelines: Implementing robust continuous integration and deployment pipelines can help manage merge conflicts and reduce merge skew. Automated testing and integration can speed up the merging process.
Partitioning and Permissions: Advanced permission systems can give teams more autonomy within a monorepo. They can have control over their own directories or services without affecting others.
Modular Architecture: Using modules or libraries within the monorepo can help isolate complexity. Engineers can work within their own modules, potentially even using different languages if the build system supports it.
Virtual File Systems: Tools like Google’s CitC and Meta’s EdenFS help manage large codebases by only fetching the necessary parts of the repository, reducing the overhead of working with massive amounts of code.

Choose monorepos

Monorepos provide simplicity, consistency, and powerful advantages in managing code at scale. Many of the drawbacks can be mitigated with the right tools and practices. Companies like Google and Meta have perfected their monorepo systems over the years, developing custom tools to handle the unique challenges they present.

If you’re starting a new project or architecting the codebase for a company of any scale, I strongly recommend choosing a monorepo structure. The long-term benefits significantly outweigh the scaling costs. Only choose polyrepo if you absolutely must.

Small Diffs - by Greg Foster

Discussion about this post