Paul Hammant's Blog: Repositories vs release cadences
How many repos for a given numbers of buildable/deployables?
You have an application to slide into production on some release cadence. You’ve split it up into a number of services, perhaps.
Are all the services get deployed together? Tricky. You could have them in one repo (directory separation, directed graph or recursive build system), and take advantage of atomic commits, but you’ll also ordinarily have a larger checkout, and a larger amount of data exchanged with each pull versus the single service you’re working on.
Google has a single Repo for 25K developers. In that, there are many hundreds of separately buildable/deployable things. Each of those has its own release cadence. Some go into prod with every commit (Continuous Deployment style). Others have daily, weekly or monthly cadences. It is all made sane by their expanding/contracting monorepo configuration and their Blaze (Bazel to you and me) build system which evolved with it.
What seems wasteful is an app/service with certain release cadence being the product of two or more repositories. At least it does where the product of those two repos had no other dependant apps.
Git ❤ Microservices (or) the push/pull bottleneck
In the Microservices era, it makes sense to have one repo per microservice. At least it does if they are separately deployable (the point of microservices). Being in a separate repo, they are by default separately buildable, of course. Being in a separate repo, they manage to avoid a push/pull bottleneck that comes with default Git usage. Perforce’s GitFusion side-install didn’t have that bottleneck, but the vast majority of Git teams are not using that.
When Git gets past the push/pull bottleneck, then we maybe are able to step back from the one repo per microsoervice world that we are in now and see team VCS use evolve in ways closer to Google’s.
Incidentally, if Google is not competing with your business idea (they won’t balkanize their ad revenue), count your lucky stars as their developer throughput is better than yours and they’d change your business and rules of engagement from afar. They will learn your domain/vertical faster than you can learn their developer efficiency. Oh, Buzz and GooglePlus notwithstanding.
Twitter conversation with Sam Newman
A correction from me - you can do a monorepo for even large teams without Buck/Bazel(Blaze) - Maven, Gradle and other recursive build systems are fine choices too (in theory). We also discussed lockstep upgrades (which I like in a monorepo), and lockstep releases - which is at least entangled in this blog entry.
Why focus on buildable/deployables?
Ok, so a modular build is a good thing, if it allows you to choose to build one module only (or a subset of modules). At least if that allows elapsed time to be saved on the build. It is also good if the build technology itself can determine a elapsed time saving on the build (versus the full build).
Maven (and similar) allows a modular build structure. You can
cd into one of the sub modules, and build from there
(and do naughty things like
-DskipTests). That is an effective way of shortening elapsed build times. Jason van Zyl
(Mr Maven) made a Smart Builder that does some deterministic quickening
of build, but I believe the competing
Gradle wins wins for speed.
Buck and Bazel (both in the image of Blaze) are directed graph build systems. There’s still intermediate buildable
things, that it can build or skip the building of (on each build invocation). Those are modules too, even if you’re
less aware of them. You always build from root with directed graph build technologies, and the
cd to sub-module
thing of Maven/Gradle doesn’t apply.
Anyway, each module is a buildable thing, BUT the dev-team has no intention of ever deploying a module on its own. Each module could be a Jar, and only a collection of those jars (say a WAR file) makes sense to deploy. Thus ‘number of repositories’ determination is for things that are buildable and deployable.