Paul Hammant's Blog:
Turning Bazel back into Blaze for monorepo nirvana
The short answer (at least for the piece I’m interested in) is to recreate gcheckout.sh
that Google didn’t open-source. Refer how Googlers Subset their Trunk
At the time of writing, there are no issues for the Bazel Github project that talk about modifying a checkout based on needs or intents that are codified in BUILD files (here’s an example). Of course that’s not a universal SCM feature. Git has “sparse checkout”, but not sparse clone, and it’s impossible to use this in a monorepo configuration. Perforce and Subversion could do it. Perforce has client specs, and Google’s internal gcheckout.sh is speaking to a Perforce backend to modify (shrink or grow) a checkout via that client spec. Subversion has ‘sparse checkouts’ feature that is quite capable, albeit a bit mechanical. It would be better as a .sparse_checkout file in the .svn/ directory.
A worked example
The hypothetical adsense app that I talked about in So you think monolith is the only alternative to microservices talked of salestax as a hypothetical depended-on component, and how it would be used by adsense and adwords.
Say you’re working away on adsense, and your checkout looks like:
root/
BUILD
adsense/
BUILD
src/
salestax/
BUILD
src/
Or more accurately:
root/
BUILD
java/
BUILD
adsense/
BUILD
salestax/
BUILD
javascript/
BUILD
adsense/
BUILD
salestax/
BUILD
java_tests/
BUILD
adsense/
BUILD
salestax/
BUILD
javascript_tests/
BUILD
adsense/
BUILD
salestax/
BUILD
You need salestax to be able to build adsense, of course. It’s in sales tax that you’re doing your work, but you realize that there’s a spelling mistake in a parameter name of on the salestax functions, and you want to fix that. You’re probably better making a local branch, as the functional change shouldn’t really be in the same commit as the spelling fix of what amounts to a shared library. Anyway, you make a change, and it all works for you, but there are other users of salestax.
In our small world that’s just adwords. If we had ALL the BUILD files we’d know what modules/directories needed salestax for compilation purposes. We don’t need all BUILD files checked out, we just need to be able to query or traverse them on the server side, or have a HEAD-centric meta-analysis of those somewhere. The bcheckout.sh presumably deals with that, and works out that the checkout should be expanded to, based on the impact of pending changes gcheckout *
, or explicit instruction gcheckout +adwords
. Here’s the result:
root/
BUILD
adsense/
BUILD
src/
adwords/
BUILD
src/
salestax/
BUILD
src/
You get to prove that you didn’t break anyone’s usage of SalesTax before you commit. The CI daemon wakes up and proves it again, when you do commit. The salestax team (if there is a team at all) receives the code-review notification, and honors objective criteria for approving or rejecting the commit. Common code ownership, for the win!
How to get this to happen?
Of course, I’m not using Bazel presently, so it’s not super critical for me, but I am interested in the science of “one big trunk” (seems people are wanting to use monorepo, and while that doesn’t explain the full intent, it is sticking).
Anyway, Perforce (the company) should nudge their user community for participation in a fork of Bazel on Github. There are 240 forks of Bazel. It’s not easy to work out what those forkers are doing with Bazel, if anything other than building it. There’s also issues, and nothing seemingly to do with gheckout or sparse-checkouts (as mentioned). I’ll bet that the Bazel team thinks this out of their scope. Perforce has its own DVCS now too, and you can do Git checkouts too. They all obey the client-spec includes and excludes of directories, meaning you get the potentially humongous monorepo benefits AND a git/dvcs command-line with no downsides.
After the Perforce community perfects it, Subversion devs could copy it (that’s the nature of open source). Git and Mercurial need improvements to repo-size and/or submodules/subtree support that would allow atomicity. Dan Luu recently posted a great article on monorepos and atomicity that you should read.
Published
Syndicated by DZone.com
Reads:
Categories
Comments formerly in Disqus, but exported and mounted statically ...
Wed, 20 May 2015 | Jon Forrest |
"Perforce has it’s own DVCS now" -> | |
Thu, 21 May 2015 | paul_hammant |
thx dude. | |
Wed, 20 May 2015 | kristina1 |
Have you seen the external repository rules for Bazel? It sounds like what you're looking for, although we don't call it sparse. Basically you specify `new_http_archive(name="adwords", url="https://github.com/adwords/...", sha256="...", build_file="path/to/a/BUILD/file/to/overlay")` in a special WORKSPACE file. If the local target you're building depends on something in the adwords repo, it'll be downloaded, otherwise it won't. See http://bazel.io/docs/build-... for an example. | |
Thu, 21 May 2015 | paul_hammant |
Downloaded isn't what I want, I genuinely want the checkout expanded to cover my change set, and still be fully buildable (incl tests). | |
Thu, 21 May 2015 | kristina1 |
To check if I understand correctly: you want just the files needed for whatever you're building/testing and no extraneous ones? | |
Fri, 22 May 2015 | paul_hammant |
And if I need to modify source files from that expanded checkout, including fixing or refining tests, I want them to be part of the same atomic checkin. | |
Thu, 28 May 2015 | kristina1 |
That only makes sense if all of the projects in the expanded checkout have the same code review system, which seems unlikely. | |
Thu, 28 May 2015 | paul_hammant |
Google's Mondrian: one codereview tool - all developers and projects within. Facebook at the same with their Phabricator (bonus; open source). | |
Fri, 29 May 2015 | Eric Smalling |
Atlasssian Stash's pull request review system is mandated at the last 2 large travel related clients I've worked with. | |
Fri, 22 May 2015 | Markus Kohler |
One main advantage of being able to get all the needed source files is that you could get rid of a costly deployment step were your build script packs everything into archives and your app server unpacks those archives. You would not even have to copy those files to the application server's deployment directory. You could just sym link them (for example). Not sure whether anyone does that in practice already. | |
Thu, 21 May 2015 | Markus Kohler |
To me it looks like the main reason for one trunk is to be able to have one history of code changes and be able to do global refactorings in an atomic way. To me that sounds like a good idea if you are in a company that has the infrastructure to do those global refactorings, such as enough automated tests, fast "global" builds etc. Only a few companies have that infrastructure so far, so for most companies having one trunk does not have real benefits. With github more or less dominating the "world", I think it is unlikely that a lot of companies will go "back" to Perforce (ok the git interface to Perforce might maybe be an option) or svn. Rather I would expect that we get a tool that can "stitch" together commits to more than one git(hub) repository. The android project for example uses the "repo" tool to make it easier to clone the needed repositories. It does not "stitch" commits together as far as I know. Yes these tools may sound like a workaround, but I think people want to be able to work independently similar to open source projects on github. | |
Thu, 21 May 2015 | paul_hammant |
You're correct - only a few companies can do this. Samsung consume the stitched-together git repos back into a Perforce trunk, I hear. |