That’s right Twitter intelligentsia, CI infrastructure to verify individual commits before they land in trunk master (assuming some team intention to be close to Trunk Based Development - TBD) is cheaper to run than infra to run CI verifications post-commit. It all comes down to one factor: where there is a need for the build to be faster than the peak interval between commits/pushes or not.

More Costly?

Assume there are many parallel short-lived feature branches (one person/pair and less than one day’s work) submitted to code review, with the intention of landing them back on trunk/master when ready. ‘Ready’ also means a bunch of machine-determined things - will not break build, standards check, linter says ‘good to go’, Findbugs etc. All of those can happen in parallel but are best complete before humans review the code. A fast build is always good, but it does not have to be that fast for that proposed commit. Why? because nobody else is going to pull from that short-lived feature branch/fork and start their dev work based on it. At least not if they’re doing TBD. They only pull from the trunk/master, and it only receives things that will not break the build (already passed code review/lint/Findbugs etc).

By contrast, if teams have configured their CI server to first focus on a commit after it lands in trunk/master then there is pressure on the team to make the entire verification faster than the average rate at which commits are accepted. That’s not because you fear a single-threaded CI server backing up. That’s easy to solve with master/slaves and parallelization (although that isn’t free). No, it’s because you fear to land potentially good commits in the shared trunk/master on top of bad commits (they broke the build) and having trouble picking things apart later. The worst case scenario is you’re freezing the trunk/master while that mess is undone. Thus, CI verifications on the shared master/trunk are more expensive because it needs to not back up, whereas out on a fork/branch that hasn’t been merged into the shared trunk/master then there’s leeway for it to fall behind little in peak commit/push situations.

Facebook

Facebook’s code reviews follow the Continuous Review model (the unit of work/review is a single commit not a batch of commits). At the ten minutes from submission mark the first reviewers are active. Their CI infrastructure, therefore, has an SLA to bring in the results of the automatic checks within that time. Any delay in the CI job being kicked is a factor too. Basically all the ‘busy’ and ‘wait’ activities performed automatically should not go past 10-minutes elapsed. Luckily many pieces are parallelizable. For Facebook ten minutes is certainly longer that the mean interval between commits to the shared trunk/master. Thus Facebook does all CI out on short-lived feature branches/forks before it gets merged back, in order to buy some breathing room.

“Sandcastle” – We don’t want humans waiting on computers:

Impact of backed up CI jobs.

Teams that have not tuned for speed end up batching their CI server jobs (verifications), or make multiple long running branches (with permission to not be green for periods of time), or slow down their commit rate, or slow down their release cadence. And then they get out-competed by nimble high-throughput organizations (like Google - thank your lucky stars they are not in your vertical).

Caveat: builds can still break in some 0.0001% basis because of timing issues between things that were 1) independently verified in a pre-merge-to-trunk scenario, and 2) were subsequently entangled somehow when there were merged together on the shared trunk/master. Auto-rollback for those should be rare, and team’s high-throughput won’t be affected.



blog comments powered by Disqus