Paul Hammant's Blog: Code Review - the unit of work should be a single commit
I’m fascinated by source-control, as my readers know. I’m not sure if this is is a bad idea, but there’s a code-review tool that could use source-control as its storage backend. I’ve no time to make it, and if someone else does, I’d be very excited.
Product Proposal
A new code review application for use by teams doing Trunk-Based Development with in-house source-control installations, and wanting to follow the “Continuous Review” idiom that Google pioneered with their “Mondrian” code review system. Guido van Ruossum’s 2006 video is a great start, and though it’s not super related to this article, I want to call out one Guido statement (2m 35s in):
Namely, “code review is a good alternative to pair programming”, that love of eXtreme Programmers (of which ThoughtWorks has thousands), and he talks to it at 5 mins 20 seconds in and goes as far as to say “best alternative” :)
This tool is aimed at those building on DevOps foundation, and wanting to go further towards CD. It would be installable in the Source-Control infrastructure, as it would review the commits of. If you’re already using a paid GitHub service for your ‘system’ you probably don’t need a new review system, and the same would be true for Atlassian’s Stash.
Technologies?
AngularJS (or similar), Source-Control as the data-store (sharing accounts with that underlying system), Sinatra (or similar) for the web-server middle bit. There would need to be some triggered scripting for each commit as it comes in, and some shelling out to the command line. There’s need to be some “rebuild of index” capability too, because things crash or get out of step.
Going Deeper
A commit in a trunk (say Svn, Perforce, Git, etc) causes a triggered generation of a diff that’s committed to second repo in the same source-control infrastructure. The web-server app (Sinatra), passes that diff (in JSON form) to an Angular page that is demanding it. Reviewers interoperate with the document in the browser, add their code-reviews comments as nodes within the same JSON document. When finished, a reviewer hits a “save review” button and it goes back (via a PUT). The JSON document on the back-end gets committed to the repo dedicated to the reviews. The key (with a .json suffix) is obviously the change-list number (or hash). Thus the original JSON diff gets expanded with each review info incrementally until the review is “done”. Each of those is a bump to .json resource’s HEAD revision.
Over time the review is a series of changes to the original doc. There’s a neatness to that progression. There’s only so many times someting is modified in review before it is considered ‘done’. That’s important because I previously hypothesized what would fit source-control as a backing store.
There’s some complexity around two or more people reviewing the same commit in trunk. You would probably lock a commit for the duration of each review in the first version, and implement a auto-merge on the server-side in a follow up release. Ideally you’d interoperate with the canonical source-control store on a per-document basis. This is easier for Subversion (RESTfully) than for Git as the latter really wants a clone before usage of working copy, though there are more advanced ways around that - especially for Github.
Power users could “checkout” the review repository, and do analysis on it. That would include history mining for various things like “strongest reviewer” and “most likely to fawn over someone else’s review without adding value him/herself”. Then there review bubbles in the same style as the commit bubbles idea.
In terms of integration with the trunk (on which we’re doing a review), commits need to be blocked until the review has garnered enough karma.
Git’s most popular workflows mean that a transient branch is created, and the review happens against that, but maybe that’s more advanced than the first version of this app would be. Specifically “after commit” code review as a team gets better committing and before they move to the pre-commit style.
Challenges
There’s a pointlessness to making another code review tool. Even if you could make a claim that an Angular-like front end exchanging JSON documents with a source-control repo, it solves no existing problem. It serves only perfectionism - sorry gang!
Profit?
It is such a small application, that someone could easily make it in open-source land. It is perhaps a race to zero in terms of price-point, as the business of tools for developers can often be. Given there are a few viable review tools and services, we have to ask ourselves if this would be worth it just to have the ‘meta’ accolade of having code-reviews stored in source-control.
Competition / Need
In the commercial space there’s Atlassian’s Crucible (related to their stash project I guess). What I do not like about Crucible in particular is the unit of work (alluded to in this article’s title). For Crucible, the unit of work is a “code review”, and that could be a number of commits. I think that’s an impediment to actually doing code reviews, and definitely an impediment to throughput as detracts from effective time in the code review tool.
There’s also the quite new Upsource from JetBrains - gotta love those guys.
Open-source-land has Rietveld which contains some lines of code from Mondrian from some years ago, but is not Mondrian. There’s also Gerrit which is a further spinoff. Both are undergoing active development.
Phabricator
The open-source Phabricator from Facebook is particularly compelling though - and worth calling out on its own, because of it’s completeness and ease of use.
It is also far closer to the key feature I particularly like, than the likes of Crucible:
The Unit of work for a code review should be a single commit,
and the key for that should be the commit ID itself.
This is probably because it is made by people that really understood how Mondrian (and command line tools) worked at Google, and fed the “high throughput” machine. It does its job well, works with Subversion as well as Git, Mercurial, and Perforce (via Git-Fusion though). Anyone wanting to try Phabricator should do so through a handy demo site (by phab.io who can do the hosting for you).
Lastly, it can be configured to do code review after commit, or pre-commit (meaning they’ve done some extra tooling for Subversion because of a lack of lightweight branches), which is quite neat. An enterprise dev-team can get better and better at code review, and plan to turn on the pre-commit gating when they are sure they are ready. Hopefully driven by throughput metrics. Colleague Sam Newman said something well in respect of this retrofitted capability [I paraphrase]:
[This] effectively reimplements the pull request model, albeit by building layers of tooling on top of [the SCM technology].
Further reading
The genus of Continuous Review
Blog entries Continuous Review, Continuous Review - continued (Mondrian discussed there), and Non-Continuous Reviews.
Turning a unified diff into a JSON document
A StackOverflow question: What’s the best way to turn a Subversion diff into JSON? (hint: Regex is not the best way).
More recently I’ve been intrigued by CCS which may be a better storage format (more mergeable than JSON). Angular would need JSON in the browser, but there’s no reason why a middle-tier piece couldn’t transparently translate between CCS and JSON. Same’s true for YAML, which might be a better fit that CCS, as not all of the latter’s features are needed for this.
Monkeying around with source-control as a backing store
I’m trying to avoid a relational schema here. Blog links:
- Open data backed by source control
- Perforce as a datastore with Client-Side MVC
- SCM and Key-Value Store Convergence
- Very Small Data (as mentioned)
- Feature Toggles: App-config workflow using SCM (relates to the project below)
- The document is the single source of truth
And a Github project: App-config app that uses AngularJS as a front-end for JSON documents held in source control.