The short answer (at least for the piece I’m interested in) is to recreate gcheckout.sh that Google didn’t open-source. Refer how Googlers Subset their Trunk

At the time of writing, there are no issues for the Bazel Github project that talk about modifying a checkout based on needs or intents that are codified in BUILD files (here’s an example). Of course that’s not a universal SCM feature. Git has “sparse checkout”, but not sparse clone, and it’s impossible to use this in a monorepo configuration. Perforce and Subversion could do it. Perforce has client specs, and Google’s internal gcheckout.sh is speaking to a Perforce backend to modify (shrink or grow) a checkout via that client spec. Subversion has ‘sparse checkouts’ feature that is quite capable, albeit a bit mechanical. It would be better as a .sparse_checkout file in the .svn/ directory.

A worked example

The hypothetical adsense app that I talked about in So you think monolith is the only alternative to microservices talked of salestax as a hypothetical depended-on component, and how it would be used by adsense and adwords.

Say you’re working away on adsense, and your checkout looks like:

root/
  BUILD
  adsense/
    BUILD
    src/
  salestax/
    BUILD
    src/

You need salestax to be able to build adsense, of course. It’s in sales tax that you’re doing your work, but you realize that there’s a spelling mistake in a parameter name of on the salestax functions, and you want to fix that. You’re probably better making a local branch, as the functional change shouldn’t really be in the same commit as the spelling fix of what amounts to a shared library. Anyway, you make a change, and it all works for you, but there are other users of salestax.

In our small world that’s just adwords. If we had ALL the BUILD files we’d know what modules/directories needed salestax for compilation purposes. We don’t need all BUILD files checked out, we just need to be able to query or traverse them on the server side, or have a HEAD-centric meta-analysis of those somewhere. The bcheckout.sh presumably deals with that, and works out that the checkout should be expanded to, based on the impact of pending changes gcheckout *, or explicit instruction gcheckout +adwords. Here’s the result:

root/
  BUILD
  adsense/
    BUILD
    src/
  adwords/
    BUILD
    src/
  salestax/
    BUILD
    src/

You get to prove that you didn’t break anyone’s usage of SalesTax before you commit. The CI daemon wakes up and proves it again, when you do commit. The salestax team (if there is a team at all) receives the code-review notification, and honors objective criteria for approving or rejecting the commit. Common code ownership, for the win!

How to get this to happen?

Of course, I’m not using Bazel presently, so it’s not super critical for me, but I am interested in the science of “one big trunk” (seems people are wanting to use monorepo, and while that doesn’t explain the full intent, it is sticking).

Anyway, Perforce (the company) should nudge their user community for participation in a fork of Bazel on Github. There are 240 forks of Bazel. It’s not easy to work out what those forkers are doing with Bazel, if anything other than building it. There’s also issues, and nothing seemingly to do with gheckout or sparse-checkouts (as mentioned). I’ll bet that the Bazel team thinks this out of their scope. Perforce has its own DVCS now too, and you can do Git checkouts too. They all obey the client-spec includes and excludes of directories, meaning you get the potentially humongous monorepo benefits AND a git/dvcs command-line with no downsides.

After the Perforce community perfects it, Subversion devs could copy it (that’s the nature of open source). Git and Mercurial need improvements to repo-size and/or submodules/subtree support that would allow atomicity. Dan Luu recently posted a great article on monorepos and atomicity that you should read.