Bazel in a Monorepo — A Practical, Copy‑Ready Guide

From Qiki
Jump to navigation Jump to search

Bazel in a Monorepo — A Practical, Copy‑Ready Guide

You already know build tools like Gradle, npm, or Make.
This guide shows you how to use Bazel effectively in a real monorepo.
We keep the language simple, give short examples, and end each section with a quick Takeaway.


1) Introduction to Bazel

What it is: Bazel is a fast, scalable, open‑source build system made by Google. It builds code, runs tests, and manages dependencies across many languages (C/C++, Java, Go, Python, etc.). It focuses on speed, reproducibility, and scalability.

Why people use it

  • Speed: Builds only what changed (fine‑grained DAG + strong caching).
  • Reproducible: Builds run in sandboxes. Inputs and outputs are tracked.
  • Scalable: Designed for large codebases and monorepos.
  • Polyglot: One tool for many languages and platforms.
  • Remote cache/execution: Share build results and offload heavy work.

When NOT to use Bazel

  • Very small project with one language and simple builds.
  • You want quick “script-like” automation only (e.g., a few npm scripts).
  • You rely on tools that are hard to make hermetic (heavy system deps, ad‑hoc shell scripts).
  • Your team cannot invest a bit of learning + repo restructuring.

Takeaway: Bazel shines in multi‑language, multi‑service monorepos where speed and consistency matter.



2) Core Concepts

  • Workspace: The whole repo Bazel sees. Top level has WORKSPACE or MODULE.bazel (modern Bazel “bzlmod”). Choose one style for the project.
  • Package: A directory with a BUILD (or BUILD.bazel) file. Each package defines targets.
  • Target: A buildable thing (library, binary, test, file group). Targets live in packages.
  • Label: The unique name of a target. Formats:
    • //path/to/pkg:target (full)
    • //path/to/pkg is shorthand for //path/to/pkg:pkg
    • //:target means “in the root package”
  • Rules & Macros:
    • Rule: A Starlark function that creates build actions (e.g., cc_binary, py_test).
    • Macro: A Starlark helper that expands to rules (for reuse and consistency).
  • Dependency Graph:
    • Bazel builds a DAG (directed acyclic graph) from targets.
    • It hashes inputs to know what changed.
    • It rebuilds only affected nodes and uses cache for the rest.

Takeaway: Think in packages, targets, and labels. Bazel builds a graph and reuses cached work aggressively.



3) Setting Up Bazel

Install (recommended: Bazelisk)

Bazel evolves quickly. Bazelisk picks the right Bazel version from your repo’s .bazelversion.

  • macOS: brew install bazelisk
  • Linux: Download Bazelisk binary or use package manager.
  • Windows: Use choco install bazelisk or download release binary.

Create a .bazelversion file at repo root:

6.5.0

Start a tiny project

monorepo/
├─ WORKSPACE           # or MODULE.bazel
├─ apps/
│  └─ hello_cpp/
│     ├─ BUILD
│     └─ main.cc
└─ .bazelrc

apps/hello_cpp/BUILD

cc_binary(
    name = "hello",
    srcs = ["main.cc"],
)

apps/hello_cpp/main.cc

#include <iostream>
int main() { std::cout << "Hello, Bazel!\n"; }

Build & run

bazel build //apps/hello_cpp:hello
bazel run //apps/hello_cpp:hello

Tip: Use .bazelrc for shared flags (see caching section).

Takeaway: Use Bazelisk + a clear directory layout. A package = a directory with a BUILD file.



4) Build Rules and Targets

Common rules (built-in examples)

  • C/C++: cc_library, cc_binary, cc_test
  • Java: java_library, java_binary, java_test
  • Python: py_library, py_binary, py_test
  • Go (via rules): go_library, go_binary, go_test

Source globbing

py_library(
    name = "utils",
    srcs = glob(["*.py"], exclude = ["*_test.py"]),
    visibility = ["//visibility:public"],
)

Note: Use glob carefully. Don’t use super-wide patterns like **/* in big packages.

Small multi-language examples

C++

# apps/math/BUILD
cc_library(
    name = "math_lib",
    hdrs = ["add.h"],
    srcs = ["add.cc"],
    visibility = ["//visibility:public"],
)

cc_test(
    name = "math_test",
    srcs = ["add_test.cc"],
    deps = [":math_lib"],
)

Java

# libs/jvm/core/BUILD
java_library(
    name = "core",
    srcs = ["Core.java"],
    visibility = ["//visibility:public"],
)

java_test(
    name = "core_test",
    srcs = ["CoreTest.java"],
    deps = [":core"],
)

Python

# libs/py/util/BUILD
py_library(
    name = "util",
    srcs = ["path.py"],
    visibility = ["//visibility:public"],
)

py_test(
    name = "util_test",
    srcs = ["path_test.py"],
    deps = [":util"],
)

Go (with rules_go; see dependencies section)

# libs/go/hello/BUILD.bazel
load("@io_bazel_rules_go//go:def.bzl", "go_library", "go_test")

go_library(
    name = "hello",
    srcs = ["hello.go"],
    importpath = "example.com/hello",
    visibility = ["//visibility:public"],
)

go_test(
    name = "hello_test",
    srcs = ["hello_test.go"],
    embed = [":hello"],
)

Custom macros (simple)

# tools/build_defs/py.bzl
def py_pkg(name, srcs, deps = []):
    native.py_library(
        name = name,
        srcs = srcs,
        deps = deps,
        visibility = ["//visibility:public"],
    )
# libs/py/strings/BUILD
load("//tools/build_defs:py.bzl", "py_pkg")

py_pkg(
    name = "strings",
    srcs = ["split.py", "join.py"],
)

Takeaway: Use the built‑in rules first. Create macros to keep BUILD files short and consistent.



5) Dependencies and External Repositories

You can manage external dependencies in two ways:

A) Classic: WORKSPACE + http_archive / git_repository

WORKSPACE (examples)

load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")

# rules_go
http_archive(
    name = "io_bazel_rules_go",
    urls = ["https://github.com/bazelbuild/rules_go/releases/download/v0.X.Y/rules_go-v0.X.Y.zip"],
    sha256 = "<PIN_SHA256>",
)
http_archive(
    name = "bazel_gazelle",
    urls = ["https://github.com/bazelbuild/bazel-gazelle/releases/download/v0.A.B/bazel-gazelle-v0.A.B.zip"],
    sha256 = "<PIN_SHA256>",
)

load("@io_bazel_rules_go//go:deps.bzl", "go_rules_dependencies", "go_register_toolchains")
go_rules_dependencies()
go_register_toolchains()

load("@bazel_gazelle//:deps.bzl", "gazelle_dependencies")
gazelle_dependencies()

# Java: rules_jvm_external (Maven)
http_archive(
    name = "rules_jvm_external",
    urls = ["https://github.com/bazelbuild/rules_jvm_external/releases/download/<VER>/rules_jvm_external-<VER>.zip"],
    sha256 = "<PIN_SHA256>",
)
load("@rules_jvm_external//:defs.bzl", "maven_install")
maven_install(
    name = "maven",
    artifacts = ["junit:junit:4.13.2"],
    repositories = ["https://repo1.maven.org/maven2"],
)

Pin versions: Always set exact versions and sha256 for reproducibility.

B) Modern: MODULE.bazel (Bzlmod)

This is the newer dependency system (Bazel 6+). If you can, prefer this for fresh repos.

MODULE.bazel (Go + Java example)

bazel_dep(name = "rules_go", version = "0.X.Y")
bazel_dep(name = "bazel_gazelle", version = "0.A.B")
bazel_dep(name = "rules_jvm_external", version = "<VER>")

# Most rule sets provide module extensions you "use".
# (Check each ruleset's README for exact extension names and setup.)

Note: Each ruleset has specific setup steps (extensions, toolchains). Follow the README for the ruleset you use.

Other sources

  • git_repository is allowed, but http_archive + release zips are more reproducible.
  • Vendor content only when necessary.

Takeaway: Put third‑party code in external repos. Pin everything (versions + SHAs). Prefer Bzlmod for new work.



6) Testing with Bazel

Basics

bazel test //...                      # run all tests
bazel test //libs/py/...             # a subtree
bazel test //apps/service_a:unit_tests

Useful flags

  • --test_output=errors (show failing logs)
  • --test_timeout=60 (seconds)
  • --jobs=auto (parallelism)
  • --test_filter=Regex (filter tests)
  • --test_tag_filters=smoke,-integration (run by tags)
  • --cache_test_results=no (force re-run)

Organizing tests

  • Keep unit tests close to code (*_test.* in same package).
  • Tag integration or e2e tests: tags = ["integration"]
  • Mark long/fragile tests appropriately (e.g., size = "large", tags = ["requires-network"]).

Examples

# C++
cc_test(
    name = "math_test",
    srcs = ["add_test.cc"],
    deps = [":math_lib"],
    size = "small",
    tags = ["unit"],
)
# Python
py_test(
    name = "handlers_integration_test",
    srcs = ["handlers_integration_test.py"],
    deps = [
        "//libs/py/util",
        "//apps/service_a/api",
    ],
    size = "large",
    tags = ["integration"],
)

Coverage (language-dependent)

bazel coverage //libs/py/... --combined_report=lcov

Takeaway: Model tests as targets. Use tags, sizes, and flags to keep CI fast and focused.



7) Incremental Builds and Caching

How Bazel detects changes

  • It fingerprints all inputs (sources, toolchains, flags).
  • If a hash changes, dependent actions rebuild; otherwise, results come from cache.

Local cache

  • Lives under Bazel’s output base. Inspect with:

    bazel info output_base
    bazel info repository_cache
    
  • You can also enable a disk cache to share cache across workspaces:

    # .bazelrc
    build --disk_cache=~/.bazel-disk-cache

Remote cache (CI & teams)

# .bazelrc
build --remote_cache=https://bazel-cache.example.com
build --remote_timeout=60
build --remote_upload_local_results=true
# optionally authenticate
# build --remote_header=Authorization=Bearer%20$TOKEN

Tips to go fast

  • Keep targets small and focused (improves reuse).
  • Avoid wild globs that change often.
  • Keep environment hermetic (fewer non-declared inputs).
  • Reuse toolchains and consistent flags in .bazelrc.

Takeaway: Caching is Bazel’s superpower. Use disk/remote caches and keep targets small to maximize hits.



8) Bazel Query and Affected Targets

Find dependencies

bazel query 'deps(//apps/service_a:server)'

Find reverse dependencies (what depends on X)

bazel query 'rdeps(//..., //libs/py/util:strings)'

List only tests that depend on X

bazel query 'kind(".*_test", rdeps(//..., //libs/py/util:strings))'

Configuration-aware query (cquery)

bazel cquery 'somepath(//apps/service_a:server, //libs/py/util:strings)' --output=graph

Action graph (aquery) — great for debugging what runs

bazel aquery 'inputs(//apps/service_a:server)'

Visualize the graph (Graphviz)

bazel query 'deps(//apps/service_a:server)' --output graph | dot -Tpng > graph.png

“Affected tests for this change” example

# From a feature branch:
CHANGED_FILES=$(git diff --name-only origin/main...HEAD)

# Query tests that reverse-depend on any changed file
bazel query "kind('.*_test', rdeps(//..., ${CHANGED_FILES}))" \
  | sort -u \
  > /tmp/affected_tests.txt

bazel test $(cat /tmp/affected_tests.txt)

Note: Bazel’s query language accepts file nodes; rdeps returns rules that depend on those files.

Takeaway: Use query/cquery/aquery to see and slice the graph, and to run only what’s affected.



9) Monorepo Workflows

Suggested layout

monorepo/
├─ WORKSPACE or MODULE.bazel
├─ .bazelrc
├─ tools/                # custom rules, macros, scripts
├─ third_party/          # checked-in patches or vendored pieces (rare)
├─ platforms/            # toolchains/platforms (optional)
├─ libs/
│  ├─ py/...
│  ├─ jvm/...
│  ├─ go/...
│  └─ cpp/...
└─ apps/
   ├─ service_a/...
   ├─ service_b/...
   └─ cli_tools/...

Sharing code between projects

  • Put shared code in libs/... with visibility = ["//visibility:public"] (or specific //apps/...).

  • Avoid deep, hidden coupling. Keep public APIs small.

  • Use aliases to expose stable APIs:

    alias(
        name = "public_api",
        actual = "//libs/py/util:strings",
        visibility = ["//visibility:public"],
    )
    

CI/CD integration (example: GitHub Actions)

# .github/workflows/bazel.yml
name: Bazel CI

on:
  pull_request:
  push:
    branches: [ main ]

jobs:
  build-and-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: bazelbuild/setup-bazel@v7  # or install bazelisk
      - run: echo "6.5.0" > .bazelversion
      - name: Restore Bazel disk cache
        uses: actions/cache@v4
        with:
          path: ~/.bazel-disk-cache
          key: bazel-disk-${{ runner.os }}-${{ hashFiles('**/*.bzl', '**/BUILD*', '.bazelrc') }}
      - run: |
          echo 'build --disk_cache=~/.bazel-disk-cache' >> .bazelrc
      - name: Affected tests only
        run: |
          CHANGED=$(git diff --name-only origin/main...HEAD || true)
          if [ -z "$CHANGED" ]; then
            bazel test //... --test_output=errors
          else
            TARGETS=$(bazel query "kind('.*_test', rdeps(//..., ${CHANGED}))" || true)
            if [ -z "$TARGETS" ]; then
              echo "No affected tests."
            else
              bazel test $TARGETS --test_output=errors
            fi
          fi

Takeaway: Organize by apps and libs, control visibility, and wire Bazel into CI. Test only what changed for speed.



10) Advanced Topics

Custom rules (Starlark basics)

A rule creates actions during analysis. Minimal example:

# tools/rules/echo_rule.bzl
def _echo_impl(ctx):
    out = ctx.actions.declare_file(ctx.label.name + ".txt")
    ctx.actions.run_shell(
        outputs = [out],
        command = "echo '{}' > {}".format(ctx.attr.msg, out.path),
    )
    return [DefaultInfo(files = depset([out]))]

echo_rule = rule(
    implementation = _echo_impl,
    attrs = {
        "msg": attr.string(mandatory = True),
    },
)
# tools/rules/BUILD
load(":echo_rule.bzl", "echo_rule")

echo_rule(
    name = "hello_file",
    msg = "Hello from a custom rule!",
)

Build it:

bazel build //tools/rules:hello_file

Toolchains & Platforms

Define a platform and build for it:

# platforms/BUILD.bazel
platform(
    name = "linux_x86_64",
    constraint_values = [
        "@platforms//os:linux",
        "@platforms//cpu:x86_64",
    ],
)

Use it:

bazel build //apps/service_a:server --platforms=//platforms:linux_x86_64

(Real toolchains depend on the language rules you use; follow their docs.)

Remote execution

  • Offload actions to a remote cluster (RBE/BuildFarm/etc.).

  • Typical flags:

    # .bazelrc
    build --remote_executor=grpc://rbe.example.com:9092
    build --remote_cache=grpc://rbe.example.com:9092
    build --spawn_strategy=remote
    build --strategy=Javac=remote # per mnemonic if needed
  • Requires proper authentication and worker images with needed toolchains.

Performance tips

  • Keep targets small and avoid cyclical deps.
  • Consolidate identical compiler flags in .bazelrc.
  • Prefer hermetic tools via rules (avoid random system tools).
  • Use aquery to see expensive actions.

Takeaway: Starlark rules = power + consistency. Platforms/toolchains/remote execution = scale.



11) Common Pitfalls and Best Practices

Pitfalls

  • Over‑globbing: glob(["**/*"]) makes unrelated changes trigger rebuilds.
  • Huge targets: One giant library causes poor cache reuse.
  • Leaky visibility: Everything public = accidental tight coupling.
  • Hidden env inputs: Scripts that read env or network without declaring inputs break reproducibility.
  • Mixing WORKSPACE and MODULE.bazel in the same repo: pick one approach for dependencies.

Debugging

  • See what failed: --verbose_failures
  • See commands: --subcommands
  • Explain analysis time: --analysis_profile=prof.gz
  • Inspect actions: aquery 'inputs(//target)'
  • Nuke outputs (last resort): bazel clean --expunge (slow; avoid as daily habit)

Best practices

  • One package per logical unit; many small packages > few huge ones.
  • Pin versions and SHAs; check them in.
  • Centralize common flags/macros in //tools/....
  • Use tags and sizes for tests; run unit tests on every PR, heavy tests on schedule or when needed.
  • Use CI cache (disk + remote) and affected tests logic.

Takeaway: Keep things small, declared, and pinned. Use Bazel’s tooling to debug and keep builds hermetic.



12) Resources and References

(Search these by name; pin exact versions from their READMEs.)

  • Bazel Documentation — “bazel.build” (official guides, query language, Starlark)
  • Bazelisk — version manager for Bazel
  • rules_go & bazel-gazelle — Go support and code generator
  • rules_jvm_external — Maven dependencies for Java/Kotlin
  • rules_python — Python rules (check README for pip_* setup)
  • rules_proto / rules_cc — Protobuf/C++ ecosystems
  • platforms — Constraint settings for OS/CPU
  • Bazel Examples — Community example repos
  • BazelCon talks — Real-world monorepo patterns at scale

Takeaway: Use official docs and rule set READMEs for exact setup steps and latest versions.



Appendix: Ready-to-Copy Snippets

.bazelrc

# Fast local caching
build --disk_cache=~/.bazel-disk-cache

# Reasonable parallelism
build --jobs=auto

# Useful test defaults
test --test_output=errors
test --nocache_test_results   # uncomment if you want to always re-run locally

# (Optional) Remote cache
# build --remote_cache=https://bazel-cache.example.com
# build --remote_timeout=60
# build --remote_upload_local_results=true

Example WORKSPACE (Go + Java)

load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")

# rules_go
http_archive(
    name = "io_bazel_rules_go",
    urls = ["https://github.com/bazelbuild/rules_go/releases/download/v0.X.Y/rules_go-v0.X.Y.zip"],
    sha256 = "<PIN_SHA256>",
)
# gazelle
http_archive(
    name = "bazel_gazelle",
    urls = ["https://github.com/bazelbuild/bazel-gazelle/releases/download/v0.A.B/bazel-gazelle-v0.A.B.zip"],
    sha256 = "<PIN_SHA256>",
)

load("@io_bazel_rules_go//go:deps.bzl", "go_rules_dependencies", "go_register_toolchains")
go_rules_dependencies()
go_register_toolchains()

load("@bazel_gazelle//:deps.bzl", "gazelle_dependencies")
gazelle_dependencies()

# Java Maven
http_archive(
    name = "rules_jvm_external",
    urls = ["https://github.com/bazelbuild/rules_jvm_external/releases/download/<VER>/rules_jvm_external-<VER>.zip"],
    sha256 = "<PIN_SHA256>",
)
load("@rules_jvm_external//:defs.bzl", "maven_install")
maven_install(
    name = "maven",
    artifacts = ["junit:junit:4.13.2"],
    repositories = ["https://repo1.maven.org/maven2"],
)

Example MODULE.bazel (Bzlmod; outline)

# Pin the Bazel modules (versions are examples; check READMEs)
bazel_dep(name = "rules_go", version = "0.X.Y")
bazel_dep(name = "bazel_gazelle", version = "0.A.B")
bazel_dep(name = "rules_jvm_external", version = "<VER>")

# Rule-specific extension usage goes here (see each ruleset’s docs)

“Only test what changed” (shell)

#!/usr/bin/env bash
set -euo pipefail

CHANGED=$(git diff --name-only origin/main...HEAD || true)
if [ -z "$CHANGED" ]; then
  echo "No changes detected vs main. Running all tests."
  exec bazel test //... --test_output=errors
fi

TARGETS=$(bazel query "kind('.*_test', rdeps(//..., ${CHANGED}))" || true)
if [ -z "$TARGETS" ]; then
  echo "No affected tests."
  exit 0
fi

echo "$TARGETS" | sort -u
bazel test $TARGETS --test_output=errors

Final Summary (Key Takeaways)

  • Model your repo as a graph of small targets. Let Bazel build only what changed.
  • Organize by apps/libs and control visibility to keep dependencies clean.
  • Pin dependencies (versions + SHAs) and prefer Bzlmod for new work.
  • Use caches (disk + remote) and affected-target queries to keep CI fast.
  • Start with built‑in rules and simple macros. Move to custom rules/toolchains when needed.
  • Debug with query, cquery, aquery, --subcommands, and --verbose_failures.