Git Monorepos Explained: From Basics to Advanced Features

From Qiki
Jump to navigation Jump to search

## Strategies for Managing a Monorepo

When a monorepo grows large, checking out the entire codebase becomes slow and consumes a lot of disk space. Modern Git provides two key features to handle this.

1. Sparse Checkout

Sparse checkout allows you to check out only a specific subset of files and directories from the repository, even though you’ve cloned the entire repository history. This makes your local working directory smaller and more manageable.

2. Partial Clone

A partial clone goes a step further by allowing you to clone the repository without downloading all of its objects (blobs, i.e., file content). Files are downloaded from the server only when you actually need them, like during a checkout. This significantly reduces the initial clone time and disk usage.



## Example: Using Sparse Checkout

Let’s imagine a monorepo with the following structure:

my-monorepo/
├── apps/
│   ├── web-app/
│   └── mobile-app/
├── libs/
│   ├── ui-components/
│   └── data-utils/
└── services/
    ├── auth-service/
    └── payment-service/

Now, let’s say you’re a web developer who only needs to work on the web-app and the shared ui-components library. Here’s how you would use sparse checkout to get only those two directories.

Step-by-Step Commands

  1. Clone the Repository without Checking Out Files

    First, clone the repository but tell Git not to populate the working directory yet. This is much faster because it only fetches the Git metadata. For even better performance on large repos, you can combine this with a partial clone (--filter=blob:none).

    # Clone with partial clone for best performance
    git clone --filter=blob:none --no-checkout git@github.com:your-org/my-monorepo.git
    
    cd my-monorepo
    
  2. Initialize Sparse Checkout

    Inside the new directory, enable the sparse-checkout feature. The --cone argument is recommended as it’s a simpler and more performant mode that works with entire directories.

    git sparse-checkout init --cone
    
  3. Define Which Directories to Check Out

    Now, tell Git which directories you want in your working copy. In our case, it’s apps/web-app and libs/ui-components.

    git sparse-checkout set apps/web-app libs/ui-components
    
  4. Check Out the Files

    Finally, check out the branch you want to work on. Git will now pull down only the directories you specified.

    git checkout main
    # Or: git switch main
    

Your my-monorepo folder will now look like this, containing only the projects you need:

my-monorepo/
├── apps/
│   └── web-app/
├── libs/
│   └── ui-components/
└── .git/

You can now work as you normally would. When you run git pull, it will fetch updates for the entire repository but will only update the files in your sparse working directory.



## Monorepo Workflow

The day-to-day workflow is nearly identical to a regular Git workflow. The main difference is that a single commit can touch multiple independent projects.

  • Atomic Commits: If a change in libs/ui-components requires a corresponding change in apps/web-app, you can commit both changes together. This is a major advantage of monorepos, as it keeps related changes grouped in a single, atomic unit.

    # Make changes in both directories
    # ...edit files in apps/web-app and libs/ui-components...
    
    # Stage all changes
    git add .
    
    # Commit the changes with a descriptive message
    git commit -m "feat(ui, web): Update button component and implement in header"
    
    # Push the commit
    git push origin main
    



## Pros and Cons of a Monorepo

Advantages

  • Simplified Dependency Management: You can directly import code from other projects in the monorepo without needing a package manager or complex versioning schemes for internal libraries.
  • Atomic Commits: Changes across multiple projects (e.g., an API service and its web client) can be made in a single commit, making the history easier to understand and roll back.
  • Large-Scale Refactoring: It’s much easier to refactor code that affects multiple parts of your codebase simultaneously.
  • Unified Tooling: You can enforce a single set of build, lint, and test tools across all projects.

Disadvantages

  • Performance: Without sparse-checkout and partial-clone, Git commands like git status can become very slow as the repository grows.
  • CI/CD Complexity: Your CI/CD system needs to be smart enough to only build and test the projects that were actually affected by a commit, rather than rebuilding everything every time. This often requires custom scripting or specialized tools.
  • Access Control: It’s harder to restrict access to specific parts of the codebase. The CODEOWNERS file can help assign responsibility but doesn’t prevent viewing.
  • Learning Curve: Developers need to learn the specific monorepo workflows and tooling your organization adopts.

Why Use --cone in Git Sparse-Checkout? Efficiency at Scale

## Cone Mode vs. Non-Cone Mode

Here’s a breakdown of the key differences:

Feature Cone Mode (--cone) Non-Cone Mode
How it Works You specify a list of directories. Git checks out everything within those directories, recursively. You provide complex .gitignore-style patterns (like src/**/*.js or !**/test.*).
Performance Extremely fast. Git can determine what to include using simple prefix matching, which scales very well in large repositories. Can be very slow. It has to match every file against every pattern, which can lead to poor performance on large codebases.
Simplicity Simple and intuitive. You just list the folders you want to work on. Complex. The pattern syntax can be tricky, especially with inclusion and exclusion rules.
Flexibility Less flexible. You can’t exclude a specific sub-directory or file within a larger directory you’ve included. Very flexible. Allows for granular control over exactly which files and folders are checked out.



## How Cone Mode Works in Practice

When you use cone mode, Git thinks in terms of a “cone” expanding from the root of your repository. When you specify a directory, you get:

  1. All files in the root directory.
  2. The full recursive contents of the directory you specified.
  3. All files in the parent directories leading to your specified directory.

Example

Imagine this repository structure:

my-monorepo/
├── docs/
│   └── guide.md
├── services/
│   ├── auth-service/
│   │   ├── src/
│   │   └── package.json
│   └── payment-service/
├── web/
│   ├── src/
│   │   └── index.js
│   └── package.json
└── README.md

If you only want to work on the web project, you would run:

# Initialize sparse checkout in the recommended cone mode
git sparse-checkout init --cone

# Set the directory you want to work in
git sparse-checkout set web

Your local file system would then look like this:

my-monorepo/
├── web/
│   ├── src/
│   │   └── index.js
│   └── package.json
└── README.md

Notice that you automatically get the README.md file from the root, but not the docs or services folders. If you had specified services/auth-service, your checkout would include the root README.md, the top-level services directory (but only its files, if any), and the full contents of auth-service.

For most monorepo use cases, the cone model is superior because:

  • 🚀 Performance: It’s designed to handle massive repositories with millions of files, where the old pattern-matching system would become unusably slow.
  • 🧠 Simplicity: The mental model is much easier. You just think “I need this folder,” and Git handles the rest efficiently.
  • 🤝 Future-Proofing: It’s the modern standard and works better with other advanced Git features like the --sparse-index.

You should only use the older non-cone mode if you have a critical need to exclude specific files or subdirectories from a larger checked-out directory, and you’re willing to accept the potential performance cost.


Finding Directories After Git Sparse-Checkout Init --cone

## How to See Available Directories

To see the list of all top-level directories available in the repository, you can use the git ls-tree command. This command lists the contents of a specific Git tree object (in this case, the latest commit) without needing to check them out.

Here is the most useful command for this situation:

git ls-tree -d --name-only HEAD

What This Command Does

  • git ls-tree: The base command to list the contents of a tree object.
  • -d: This flag filters the list to show only directories.
  • --name-only: This ensures the output shows just the names of the directories, making it clean and easy to read.
  • HEAD: This tells Git to look at the tree of the currently checked-out commit.

The output will be a list of the top-level folders in your repository. For example:

apps
libs
services
docs

What to Do Next

Once you see the directory you want to work on from that list, you can add it to your sparse checkout set.

For example, if you want to work on the apps and libs directories, you would run:

git sparse-checkout set apps libs

After you run this command, Git will immediately populate your working directory with the apps and libs folders and all of their contents.