Git Monorepos Explained: From Basics to Advanced Features
## Strategies for Managing a Monorepo
When a monorepo grows large, checking out the entire codebase becomes slow and consumes a lot of disk space. Modern Git provides two key features to handle this.
1. Sparse Checkout
Sparse checkout allows you to check out only a specific subset of files and directories from the repository, even though you’ve cloned the entire repository history. This makes your local working directory smaller and more manageable.
2. Partial Clone
A partial clone goes a step further by allowing you to clone the repository without downloading all of its objects (blobs, i.e., file content). Files are downloaded from the server only when you actually need them, like during a checkout. This significantly reduces the initial clone time and disk usage.
## Example: Using Sparse Checkout
Let’s imagine a monorepo with the following structure:
my-monorepo/ ├── apps/ │ ├── web-app/ │ └── mobile-app/ ├── libs/ │ ├── ui-components/ │ └── data-utils/ └── services/ ├── auth-service/ └── payment-service/
Now, let’s say you’re a web developer who only needs to work on the web-app
and the shared ui-components
library. Here’s how you would use sparse checkout to get only those two directories.
Step-by-Step Commands
Clone the Repository without Checking Out Files
First, clone the repository but tell Git not to populate the working directory yet. This is much faster because it only fetches the Git metadata. For even better performance on large repos, you can combine this with a partial clone (
--filter=blob:none
).# Clone with partial clone for best performance git clone --filter=blob:none --no-checkout git@github.com:your-org/my-monorepo.git cd my-monorepo
Initialize Sparse Checkout
Inside the new directory, enable the sparse-checkout feature. The
--cone
argument is recommended as it’s a simpler and more performant mode that works with entire directories.git sparse-checkout init --cone
Define Which Directories to Check Out
Now, tell Git which directories you want in your working copy. In our case, it’s
apps/web-app
andlibs/ui-components
.git sparse-checkout set apps/web-app libs/ui-components
Check Out the Files
Finally, check out the branch you want to work on. Git will now pull down only the directories you specified.
git checkout main # Or: git switch main
Your my-monorepo
folder will now look like this, containing only the projects you need:
my-monorepo/ ├── apps/ │ └── web-app/ ├── libs/ │ └── ui-components/ └── .git/
You can now work as you normally would. When you run git pull
, it will fetch updates for the entire repository but will only update the files in your sparse working directory.
## Monorepo Workflow
The day-to-day workflow is nearly identical to a regular Git workflow. The main difference is that a single commit can touch multiple independent projects.
Atomic Commits: If a change in
libs/ui-components
requires a corresponding change inapps/web-app
, you can commit both changes together. This is a major advantage of monorepos, as it keeps related changes grouped in a single, atomic unit.# Make changes in both directories # ...edit files in apps/web-app and libs/ui-components... # Stage all changes git add . # Commit the changes with a descriptive message git commit -m "feat(ui, web): Update button component and implement in header" # Push the commit git push origin main
## Pros and Cons of a Monorepo
Advantages ✅
- Simplified Dependency Management: You can directly import code from other projects in the monorepo without needing a package manager or complex versioning schemes for internal libraries.
- Atomic Commits: Changes across multiple projects (e.g., an API service and its web client) can be made in a single commit, making the history easier to understand and roll back.
- Large-Scale Refactoring: It’s much easier to refactor code that affects multiple parts of your codebase simultaneously.
- Unified Tooling: You can enforce a single set of build, lint, and test tools across all projects.
Disadvantages ❌
- Performance: Without
sparse-checkout
andpartial-clone
, Git commands likegit status
can become very slow as the repository grows. - CI/CD Complexity: Your CI/CD system needs to be smart enough to only build and test the projects that were actually affected by a commit, rather than rebuilding everything every time. This often requires custom scripting or specialized tools.
- Access Control: It’s harder to restrict access to specific parts of the codebase. The
CODEOWNERS
file can help assign responsibility but doesn’t prevent viewing. - Learning Curve: Developers need to learn the specific monorepo workflows and tooling your organization adopts.
Why Use --cone in Git Sparse-Checkout? Efficiency at Scale
## Cone Mode vs. Non-Cone Mode
Here’s a breakdown of the key differences:
Feature | ✅ Cone Mode (--cone )
|
❌ Non-Cone Mode |
---|---|---|
How it Works | You specify a list of directories. Git checks out everything within those directories, recursively. | You provide complex .gitignore -style patterns (like src/**/*.js or !**/test.* ).
|
Performance | Extremely fast. Git can determine what to include using simple prefix matching, which scales very well in large repositories. | Can be very slow. It has to match every file against every pattern, which can lead to poor performance on large codebases. |
Simplicity | Simple and intuitive. You just list the folders you want to work on. | Complex. The pattern syntax can be tricky, especially with inclusion and exclusion rules. |
Flexibility | Less flexible. You can’t exclude a specific sub-directory or file within a larger directory you’ve included. | Very flexible. Allows for granular control over exactly which files and folders are checked out. |
## How Cone Mode Works in Practice
When you use cone mode, Git thinks in terms of a “cone” expanding from the root of your repository. When you specify a directory, you get:
- All files in the root directory.
- The full recursive contents of the directory you specified.
- All files in the parent directories leading to your specified directory.
Example
Imagine this repository structure:
my-monorepo/ ├── docs/ │ └── guide.md ├── services/ │ ├── auth-service/ │ │ ├── src/ │ │ └── package.json │ └── payment-service/ ├── web/ │ ├── src/ │ │ └── index.js │ └── package.json └── README.md
If you only want to work on the web
project, you would run:
# Initialize sparse checkout in the recommended cone mode
git sparse-checkout init --cone
# Set the directory you want to work in
git sparse-checkout set web
Your local file system would then look like this:
my-monorepo/ ├── web/ │ ├── src/ │ │ └── index.js │ └── package.json └── README.md
Notice that you automatically get the README.md
file from the root, but not the docs
or services
folders. If you had specified services/auth-service
, your checkout would include the root README.md
, the top-level services
directory (but only its files, if any), and the full contents of auth-service
.
## Why --cone
is Recommended
For most monorepo use cases, the cone model is superior because:
- 🚀 Performance: It’s designed to handle massive repositories with millions of files, where the old pattern-matching system would become unusably slow.
- 🧠 Simplicity: The mental model is much easier. You just think “I need this folder,” and Git handles the rest efficiently.
- 🤝 Future-Proofing: It’s the modern standard and works better with other advanced Git features like the
--sparse-index
.
You should only use the older non-cone mode if you have a critical need to exclude specific files or subdirectories from a larger checked-out directory, and you’re willing to accept the potential performance cost.
Finding Directories After Git Sparse-Checkout Init --cone
## How to See Available Directories
To see the list of all top-level directories available in the repository, you can use the git ls-tree
command. This command lists the contents of a specific Git tree object (in this case, the latest commit) without needing to check them out.
Here is the most useful command for this situation:
git ls-tree -d --name-only HEAD
What This Command Does
git ls-tree
: The base command to list the contents of a tree object.-d
: This flag filters the list to show only directories.--name-only
: This ensures the output shows just the names of the directories, making it clean and easy to read.HEAD
: This tells Git to look at the tree of the currently checked-out commit.
The output will be a list of the top-level folders in your repository. For example:
apps libs services docs
What to Do Next
Once you see the directory you want to work on from that list, you can add it to your sparse checkout set.
For example, if you want to work on the apps
and libs
directories, you would run:
git sparse-checkout set apps libs
After you run this command, Git will immediately populate your working directory with the apps
and libs
folders and all of their contents.