The origins of Trunk-Based Development

I’ve been trying to track down the pioneers of Trunk-Based Development, and find out what their rationale and influences were.

A big-ass trunk “General Sherman”, in California from wikipedia.

Merge wasn’t always smooth

If you ignore the “only one person can edit a file at a time” era of source-control, merge has always a factor. It doesn’t matter if you’re merging to working copy, or to/from a branch, merge is in your tool-chain. The advances at the end of the 90’s were more effective three-way merges, and better and better merge point tracking.

At some level, if merge is buggy as it was in the early days, you’re forced into a trunk model. Everyone in a team would sync (pull) to (from) the HEAD revision of the single branch many times a day, make small changes, and check them back in. That way the pain from co-workers would be minimized. Good merging and branching allowed you to step away from that model, but a minority of experienced developers wonder if you should. At least, wonder how long those branches should live, and how many developers should be allowed to contribute to them.

Trunk is just one mode of operation

CVS came before Subversion. The wikipedia page says a handful of scripts created in 1986, were fashioned into an initial release in 1990, with the last release being in 2008. It was created “to overcome deficiencies in RCS”) (1982 to date). When CVS’s limitations were determined to be unsurmountable, the Subversion project kicked off (The initial 0.x release was in 2000). Subversion took influences from other commercial packages like Perforce too. Subversion has a default layout for a new repo for “trunk, tags and branches”, but that isn’t forcing the trunk based development model that I promote at all. People who use Subversion in the enterprise can often say “we do trunk based development”, but that’s only true if the only branches are for releases, and they are made on a just in time basis for the release itself.

Perforce was around in the 90’s too, and very solid. While Trunk-Based Development was a mode of operation, it wasn’t the only way of using it, and the Perforce people didn’t focus on it as a bedrock practice. Not like I do, at least. They did describe it, to some degree, in 1998’s High-level Best Practices in Software Configuration Management paper. I was Head of Development at a startup in London in 2001, and it was the bible I made lieutenants read before flipping the team from CVS to Perforce (and Trunk).

Thoughts from SCM pioneers, Karl Fogel and Jim Blandy

Karl was one of the CVS developers who stepped helped kick off Subversion. Casting his mind back to the late 90’s he says:

We were mainly influenced by CVS. I guess I’d say Subversion’s branching design comes from “CVS + atomicity”:

A branch should just be a lightweight copy (copy-on-write);

A tag is just a read-only branch;

Since branches and tags are just copies, and copying is a versioned event, it follows that branching and tagging events are versioned.

That last point is still one of the most frustrating things about Git for me: that in Git, creating or removing a branch is not an event in the version history. A branch is just an un-versioned moveable pointer to a particular commit. (There’s a lot I do like about Git, I hasten to add – just there are a few things where I think “dang, that’s not right”).

As for development models, we had ones we were accustomed to, but we wanted a system flexible enough to accommodate different models. Hence the “it’s just a copy” starting point – you can support a lot of different models on top of a consistent branching primitive.

Well, I know Jim and I were using that model for open source development (then called “free software development” of course) well before SVN, and in fact even before WAN-enabled CVS had been released – we were using it in projects versioned under RCS. The earliest that I’m sure I was collaborating with others using that basic model in RCS was 1993

As for the usage of the “everyone on one trunk, with occasional short-lived branches for bug-fixing or experimental development” model:

I’m pretty sure that’s what was going on the Emacs tree when Emacs was still on stock RCS/CVS – at least as of 1992, when I got involved, but Jim could confirm or deny that that model was going on earlier than that.

Again, I don’t think it was unique to Emacs. It’s a pretty obvious model to use when one has a competent group of developers who can agree to not break (or not break too often) the master line of development.

I hope that’s helpful, Paul, even though it probably can’t count as the most rigorous historical sourcing ever! (I suppose an actual investigation of the Emacs source code’s history, which I believe has been preserved pretty well through all the VC system migrations, could confirm, although that would be a lot of work).

(Karl started Question Copyright some time ago, and you should really check that out)

Jim also helped design Subversion (and was a long term GDB maintainer). On how they wrote Subversion itself, he says:

We all worked on a single trunk, and we didn’t use branches much at all, let alone long-lived branches.

On the back history, towards where trunk-only models were first used?

I’d been the maintainer of GNU Emacs before (1990 to 1993, taking over from Joe Arceneaux), where we just used a single source tree, without version control. Then, I’d been working on GDB for Red Hat, where we used CVS and used the trunk for almost all work, and made releases from branches.

Thoughts from Craig Silverstein - Google’s first hire

Says I: Who were the chief architects of the “we’re all in one big trunk, sharing code at source level” inside Google? Also: what were the influences for such a design?

I believe that I am as responsible as anyone for the policy of having our entire codebase in one big repository. My philosophy was that it would make it as easy as possible to share code across projects. It also helped avoid problems with versioning, since the expectation was that everyone would always run their code at the tip of the trunk. I don’t remember there being any outside influences that prompted that decision.

This had its plusses and minuses, and while I feel overall it was a win it definitely had a cost. As the company grew the base libraries would get modified in ad hoc ways that did not represent a coherent vision; eventually we had to put gatekeepers on checkins to the base/ directory. And we had to write lots of bespoke tooling to allow people to only check out a subset of the repo, since it was too large to check out the whole thing.

If I were doing it again I don’t know how I would arrange things. At Khan Academy, where we use git, we’ve organized things using lots of submodules, which again has plusses and minuses. Basically, my conclusion is that this is a surprisingly hard problem.

Thus, Craig was the designer/engineer of Google’s mega-trunk. Craig now works for the phenomenal Khan Academy as he mentions.

Jez Humble co-author of the Continuous Delivery book

Jez can’t actually remember the 90’s, but says:

Version control is fundamentally a communication tool, and you can’t communicate what you’re doing, or get feedback from the deployment pipeline, if you’re working off in long-lived feature branches that don’t get merged regularly into trunk. Feature branches work fine on small, experienced teams, but they don’t scale because to release you have to merge the feature branches with each other, which is a combinatorial problem, which is why organizations like Google work off trunk and at HEAD

Jez has exclusively encountered trunk branching models in his career. By contrast that’s just mostly the case for me, or I got the client there after a short struggle. As he’d never encountered the truly shit ClearCase style multi-branch scenario, I donated a branching diagram or two originally used at ThoughtWorks mission at a US bank, where we took them to trunk from that hell. For the CD book I mean - page 349 or thereabouts.

Jez frequently reminds me that the practice of Continuous Integration (CI) really requires a trunk usage. To some extent he says, CI is trunk based development”. Wikipedia authors have refined an opening snippet “on CI which really rams home this point:

Continuous integration (CI) is the practice, in software engineering, of merging all developer working copies with a shared mainline several times a day. It was first named and proposed by Grady Booch in his method - see book ref, but he did not advocate integrating several times per day. It was adopted as part of extreme programming (XP), which did advocate integrating more than once per day, perhaps as many as tens of times per day…

They’ve used “mainline” in the Wikipedia article, but I like to disambiguate from the hell that ClearCase made popular in the middle naughties, also called mainline, that was very different. Hence trunk and trunk based development.

At the time of writing, I have 33 articles on trunk based development.

Untangling Configuration Management - Martin Cagan - 1995

(added Jan, 2017)

Article online

Section 4.4 “Models of Parallel Development”:

One extreme method for dealing with the problems of parallel development is to enforce serialization. This simply says that parallel versions are prevented. The first person derives a new version and begins work. If another person asks to derive from the same ancestor, then that person’s derive fails, with a reference to whoever owns the new version. This of course makes the person that currently owns the object a bottleneck (which all too often leads to circumvention of the system), which is why many organizations prefer to support at least one of the forms of parallel development.

(emphasis mine).

Yup, that’s what Trunk-Based Development is - about enforced serialization. Back then, you’d have to stand up and say “when are you finished because I need to check in a change to the same file”. These days tools greatly ease the checkin process - you feel everything is serial, in retrospect it only looks serial, but in reality was massively parallel.

MicroSoft’s SLM (“slime”) tool’s workflow - 1995

(added Jan, 2017)

Inside Microsoft Secrets: How the World’s Most Powerful Software Company Creates Technology, Shapes Markets and Manages People (Michael Cusumano & Richard Selby, 1995) there is a section dealing with Microsoft’s per-developer workflow using Source Library Manager (SLM) on a one-branch model (the book does not use the words trunk or branch). SLM (AKA “slime”) - an Internal Microsoft tool for source-control until it was replaced by Source Depot in 1998. That daily, rigorous, developer workflow was:

checkout (update/pull/sync or checkout afresh)
implement feature
build
test the feature
sync (update/pull)
merge
build
test the feature
smoke tests
check in (commit/push)
makes a daily build from HEAD of the shared master branch

That is the workflow in use today for just about any team and any branching model, but back then it was for a model that had a single branch. Sure test automation has improved, and source-control is no doubt much faster, but the workflow should be familar.

← Previous Archive Next →

Published

April 23^rd, 2015

Syndicated by DZone.com
Reads: 3532 (link)

Comments formerly in Disqus, but exported and mounted statically ...

Wed, 29 Apr 2015	Matthew Skelton
"Jez can’t actually remember the 90’s" heh heh :) Thanks for showing comprehensively that trunk-base development is not a new fad but solid, sound practice.
Sun, 24 Jan 2016	farandwide
As a young engineer, I worked with two organizations in the Boston financial community in the mid-late 1990's doing trunk based development. I also remember reading a presentation from source control consultants in GB in early 2000 which strongly advocated the practice..So, the practice was already geographically disbursed and being used with moderately sized 10-50 person teams in the 1990s. On the other hand, trunk based development has not been popular in west coast, internet companies until fairly recently. My experience with those companies, has been that somehow early on there was both less interest in source control and a strong conviction that complicated branching, merge procedures where what their "special" needs required. Later vested interests solidified that complexity and so "the right way to do it" was passed to the next generation of .engineers.
Mon, 16 Jan 2017	Brad Appleton
You should really look at some of the SEI Technical Reports on "Software Configration Management" during the early nineties (and late eighties). The first ones to look at are the ones with Peter Feiler as (co)author, especially the following: - "Configuration Management Models in Commercial Environments", March 1991, by Peter H. Feiler SEI Technical Report: CMU/SEI-91-TR-007 (see http://resources.sei.cmu.ed... - "Transaction-Oriented Configuration Management: A Case Study", November 1990 by Peter H. Feiler, Grace Downey SEI Technical Report: CMU/SEI-90-TR-023 (see http://resources.sei.cmu.ed... These are the beginning of TBD, and describe not only the "transaction-based" model, but several other concepts & constructs that were being introduced into SCM tools at that time which these days would be considered essential for TBD and CI/CD, like: - atomic-commits, - change-tasks (sometimes implemented as "change-sets, or "change-packages"), - short-vs-long transaction-model, - use of branches to support evolutionary team-based development (as opposed to how branches worked in SCCS and RCS, where they were simply pairs of digits in a semantic version-numbering scheme, rather than a way of seeing and viewing+organizing them in a tree-like structure with actual names [instead of numbers] like most tools do today) Also, the you have mispelled Wingerd's last name in your citation above. There were three papers that came out around the same time as their "SCM Best Practices" paper, and it is no coincidence because we were all collaborating together reviewing and commenting on each other's work writing our own paper. The other two are: - Advanced SCM Branching Strategies, by Steven Vance (yes - the same one wrote the 2013 book "Quality Code" from Addison-Wesley) https://www.vance.com/steve... - Streamed Lines: Branching Patterns for Parallel Software Development, Brad Appleton, Steve Berczuk, et.al (see http://www.bradapp.com/acme... The "streamed lines" paper attempted used the perforce-definition of mainline, and tried to be overly comprehensive describing way too many different kinds of branching strategies (and all their various patterns), but the style it refers to as "Late/Lazy/Deferred Branching" is exactly the style of lean-branching that your TBD would later describe (and "Project-oriented branching" was the term it used for what you later called "branch-by-abstraction). Vance did much the same thing (using perforce's definition of mainline) and it's description of "need-driven-branching" cites Winger's "branch only when necessary" quote. In all those cases, the important thing to note is that what was called "mainline" was solely about a structure of branches and keeping it as "streamlined" as possible, ideally using a single "trunk". And whether to branch (or merge) early vs late and often versus infrequently were related to branching-strategy selection and merging-strategy-selection (respectively) - which is what defined the "codeline policy" for a codeline (think "definition of done" for when to commit changes to a branch, or start a new branch). Lots of folks look at a branching diagram, and assume they know WHEN a branch on it is created (presumably shortly after the tag it originated from was created) - but that is an invalid assumption. All that tells you is what the origina-point of the branch is, not whether it was created right after the corresponding tag, versus weeks, months, or even years later, only whaen it was absolutely needed)

Paul Hammant's Blog: The origins of Trunk-Based Development