Paul Hammant's Blog: The origins of Trunk Based Development
I’ve been trying to track down the pioneers of Trunk Based Development, and find out what their rationale and influences were.
A big-ass trunk “General Sherman”, in California from wikipedia.
Merge wasn’t always smooth
If you ignore the “only one person can edit a file at a time” era of source-control, merge has always a factor. It doesn’t matter if you’re merging to working copy, or to/from a branch, merge is in your tool-chain. The advances at the end of the 90’s were more effective three-way merges, and better and better merge point tracking.
At some level, if merge is buggy as it was in the early days, you’re forced into a trunk model. Everyone in a team would sync (pull) to (from) the HEAD revision of the single branch many times a day, make small changes, and check them back in. That way the pain from co-workers would be minimized. Good merging and branching allowed you to step away from that model, but a minority of experienced developers wonder if you should. At least, wonder how long those branches should live, and how many developers should be allowed to contribute to them.
Trunk is just one mode of operation
CVS came before Subversion. The wikipedia page says a handful of scripts created in 1986, were fashioned into an initial release in 1990, with the last release being in 2008. It was created “to overcome deficiencies in RCS”) (1982 to date). When CVS’s limitations were determined to be unsurmountable, the Subversion project kicked off (The initial 0.x release was in 2000). Subversion took influences from other commercial packages like Perforce too. Subversion has a default layout for a new repo for “trunk, tags and branches”, but that isn’t forcing the trunk based development model that I promote at all. People who use Subversion in the enterprise can often say “we do trunk based development”, but that’s only true if the only branches are for releases, and they are made on a just in time basis for the release itself.
Perforce was around in the 90’s too, and very solid. While Trunk Based Development was a mode of operation, it wasn’t the only way of using it, and the Perforce people didn’t focus on it as a bedrock practice. Not like I do, at least. They did describe it, to some degree, in 1998’s High-level Best Practices in Software Configuration Management paper. I was Head of Development at a startup in London in 2001, and it was the bible I made lieutenants read before flipping the team from CVS to Perforce (and Trunk).
Thoughts from SCM pioneers, Karl Fogel and Jim Blandy
Karl was one of the CVS developers who stepped helped kick off Subversion. Casting his mind back to the late 90’s he says:
We were mainly influenced by CVS. I guess I’d say Subversion’s branching design comes from “CVS + atomicity”:
A branch should just be a lightweight copy (copy-on-write);
A tag is just a read-only branch;
Since branches and tags are just copies, and copying is a versioned event, it follows that branching and tagging events are versioned.
That last point is still one of the most frustrating things about Git for me: that in Git, creating or removing a branch is not an event in the version history. A branch is just an un-versioned moveable pointer to a particular commit. (There’s a lot I do like about Git, I hasten to add – just there are a few things where I think “dang, that’s not right”).
As for development models, we had ones we were accustomed to, but we wanted a system flexible enough to accommodate different models. Hence the “it’s just a copy” starting point – you can support a lot of different models on top of a consistent branching primitive.
Well, I know Jim and I were using that model for open source development (then called “free software development” of course) well before SVN, and in fact even before WAN-enabled CVS had been released – we were using it in projects versioned under RCS. The earliest that I’m sure I was collaborating with others using that basic model in RCS was 1993
As for the usage of the “everyone on one trunk, with occasional short-lived branches for bug-fixing or experimental development” model:
I’m pretty sure that’s what was going on the Emacs tree when Emacs was still on stock RCS/CVS – at least as of 1992, when I got involved, but Jim could confirm or deny that that model was going on earlier than that.
Again, I don’t think it was unique to Emacs. It’s a pretty obvious model to use when one has a competent group of developers who can agree to not break (or not break too often) the master line of development.
I hope that’s helpful, Paul, even though it probably can’t count as the most rigorous historical sourcing ever! (I suppose an actual investigation of the Emacs source code’s history, which I believe has been preserved pretty well through all the VC system migrations, could confirm, although that would be a lot of work).
(Karl started Question Copyright some time ago, and you should really check that out)
Jim also helped design Subversion (and was a long term GDB maintainer). On how they wrote Subversion itself, he says:
We all worked on a single trunk, and we didn’t use branches much at all, let alone long-lived branches.
On the back history, towards where trunk-only models were first used?
I’d been the maintainer of GNU Emacs before (1990 to 1993, taking over from Joe Arceneaux), where we just used a single source tree, without version control. Then, I’d been working on GDB for Red Hat, where we used CVS and used the trunk for almost all work, and made releases from branches.
Thoughts from Craig Silverstein - Google’s first hire
Says I: Who were the chief architects of the “we’re all in one big trunk, sharing code at source level” inside Google? Also: what were the influences for such a design?
I believe that I am as responsible as anyone for the policy of having our entire codebase in one big repository. My philosophy was that it would make it as easy as possible to share code across projects. It also helped avoid problems with versioning, since the expectation was that everyone would always run their code at the tip of the trunk. I don’t remember there being any outside influences that prompted that decision.
This had its plusses and minuses, and while I feel overall it was a win it definitely had a cost. As the company grew the base libraries would get modified in ad hoc ways that did not represent a coherent vision; eventually we had to put gatekeepers on checkins to the base/ directory. And we had to write lots of bespoke tooling to allow people to only check out a subset of the repo, since it was too large to check out the whole thing.
If I were doing it again I don’t know how I would arrange things. At Khan Academy, where we use git, we’ve organized things using lots of submodules, which again has plusses and minuses. Basically, my conclusion is that this is a surprisingly hard problem.
Thus, Craig was the designer/engineer of Google’s mega-trunk. Craig now works for the phenomenal Khan Academy as he mentions.
Jez Humble co-author of the Continuous Delivery book
Jez can’t actually remember the 90’s, but says:
Version control is fundamentally a communication tool, and you can’t communicate what you’re doing, or get feedback from the deployment pipeline, if you’re working off in long-lived feature branches that don’t get merged regularly into trunk. Feature branches work fine on small, experienced teams, but they don’t scale because to release you have to merge the feature branches with each other, which is a combinatorial problem, which is why organizations like Google work off trunk and at HEAD
Jez has exclusively encountered trunk branching models in his career. By contrast that’s just mostly the case for me, or I got the client there after a short struggle. As he’d never encountered the truly shit ClearCase style multi-branch scenario, I donated a branching diagram or two originally used at ThoughtWorks mission at a US bank, where we took them to trunk from that hell. For the CD book I mean - page 349 or thereabouts.
Jez frequently reminds me that the practice of Continuous Integration (CI) really requires a trunk usage. To some extent he says, CI is trunk based development”. Wikipedia authors have refined an opening snippet “on CI which really rams home this point:
Continuous integration (CI) is the practice, in software engineering, of merging all developer working copies with a shared mainline several times a day. It was first named and proposed by Grady Booch in his method - see book ref, but he did not advocate integrating several times per day. It was adopted as part of extreme programming (XP), which did advocate integrating more than once per day, perhaps as many as tens of times per day…
They’ve used “mainline” in the Wikipedia article, but I like to disambiguate from the hell that ClearCase made popular in the middle naughties, also called mainline, that was very different. Hence trunk and trunk based development.
At the time of writing, I have 33 articles on trunk based development.
Untangling Configuration Management - Martin Cagan - 1995
Section 4.4 “Models of Parallel Development”:
One extreme method for dealing with the problems of parallel development is to enforce serialization. This simply says that parallel versions are prevented. The first person derives a new version and begins work. If another person asks to derive from the same ancestor, then that person’s derive fails, with a reference to whoever owns the new version. This of course makes the person that currently owns the object a bottleneck (which all too often leads to circumvention of the system), which is why many organizations prefer to support at least one of the forms of parallel development.
Yup, that’s what Trunk Based Development is about enforced serialization. Back then, you’d have to stand up and say “when are you finished because I need to check in a change to the same file”. These days tools greatly ease the checkin process - you feel everything is serial, in retrospect it looks serial, but in reality was massively parallel.
High Level SCM Best Practices - Wingerd and Siewald - 1998
blog comments powered by Disqus