Some companies do code reviews to enforce their own wish for developer excellence, and reduce the cost of production defects or code-unmaintainability. Others do it because they are asked to by regulators, or even customers if they are selling product or service outside their organization. A few days ago I made the case that the most efficient code review process is one that deals with reviews within minutes of hours of the commit they pertain to. I didn’t dwell so much on the difference between pre-commit reviews (that until they “pass”, the commit can’t go in), and post-commit reviews (which suggest prioritized follow up work in the case of “fail”).

Where are code review ‘records’ stored?

In BigTable (Google’s sparse, distributed multi-dimensional sorted map)

  • Google’s Mondrian. Private to Google of course
  • Rietveld uses BigTable via AppEngine (for all to use)

In a Conventional Relational Schema

  • Crucible - Oracle, SqlSever, Maria/MySQL, or Postgres
  • CodeCollaborator - Oracle, SqlSever, Maria/MySQL
  • Geritt - MySQL/Maria or PostgreSQL
  • Phabricator - MySQL

What about those records in Source-Control instead?

Given the diffs for commits come from a Source-Control system, there’s a possibility that Source-Control could be the place that reviews could be stored. You could have it as a branch of the same source control-system, but there’s no meaning to merging between the branches. It is attractive though as the authentication system for the committers for one branch, is by default the same as the auth-system for the other branch. Perhaps it’s easiest to actually have a separate repository that is linked by convention only, but with back-end configuration they can share authentication:

Subversion

http://svn.mycompany.com/trade_engine (has trunk, tags, branches)
http://svn.mycompany.com/trade_engine_reviews (has no trunk, tags, branches)

Git

git://git.mycompany.com/trade_engine.git (has directories - master and others)
git://git.mycompany.com/trade_engine_reviews.git (has just master)

Mixed

git://git.mycompany.com/trade_engine.git (has directories - master and others)
http://svn.mycompany.com/trade_engine_reviews (has no trunk, tags, branches)

A Review as a single JSON document?

From the “Continuous Review” article from a few days ago, here’s the diff of the last commit on the trunk:

ph7785:svn_workingcopy paul$ svn diff -c 5
Index: trunk/file.txt
===================================================================
--- trunk/file.txt	(revision 4)
+++ trunk/file.txt	(revision 5)
`` -1,4 +1,4 ``
-Efficiently unleash socially-awkward information without
+Efficiently unleash tie-dyed information without
 any value. Dramatically
-maintain clicks-and-mortar solutions without
+maintain empirical solutions without
 functional aspects.

As you can see, a diff is fairly compact. I’m thinking that it would be better stored in JSON format though. If it were it would be instantly compatible with web-technologies like AngularJS.

There’s some important stuff missing from that commit message (who did it & when, etc), that is retrievable from a second command:

ph7785: paul$ svn log -c 5
------------------------------------------------------------------------
r5 | harry | 2013-12-04 17:07:04 -0500 (Wed, 04 Dec 2013) | 1 line

follow-up change on trunk
------------------------------------------------------------------------

We would combine the two together of course.

JSON version of a diff

Here’s what I’m thinking (5.json):

{
 "who": "harry",
 "when": "2013-12-04 17:07:04 -0500",
 "message": "follow-up change on trunk",
 "changes": [
  {
   "index": "file.txt",
   "to": "file.txt   (revision 3)",
   "from": "file.txt   (revision 5)",
   "chunks": [
    {
     "locn": "-1,4 +1,4",
     "lines": [
      "-Efficiently unleash cross-media information without",
      "+Efficiently unleash sexed-up information without",
      "-maintain clicks-and-mortar solutions without",
      "+maintain empirical solutions without",
      " functional aspects."
     ]
    }
   ]
  }
 ]
}

To that end, I posed a question on StackOverflow a few days ago. I was concerned about a bunch of Unix’s sed commands piped together to turn the diff into JSON and the appropriateness of what I’d made. Ben Reser (Subversion dev team, and WanDisco employee) chimed in with a Python way that would be better long term.

Note: I’d prefer ‘svn diff’ to take a command-line arg ‘-json’ to spit out JSON format itself, but that’s a lot of work for the Svn team.

Anyway, Following the way that code review tools work, I see reviews at the file level might be like so:

{
 "who": "harry",
 "when": "2013-12-04 17:07:04 -0500",
 "message": "follow-up change on trunk",
 "changes": [
  {
   "index": "file.txt",
   "to": "file.txt   (revision 3)",
   "from": "file.txt   (revision 5)",
   "chunks": [
     "_comment": "etc etc etc",
   ],
   "reviews": [
     "review": {
       "who": "paul",
       "when": "2013-12-07 14:58",
       "what": "Great Work"
       "status" "LGTM"
     }
   ]
  }
 ]
}

There’s also going to be a need for comments on lines specifically:

{
 "who": "harry",
 "when": "2013-12-04 17:07:04 -0500",
 "message": "follow-up change on trunk",
 "changes": [
  {
   "index": "file.txt",
   "to": "file.txt   (revision 3)",
   "from": "file.txt   (revision 5)",
   "chunks": [
    {
     "locn": "-1,4 +1,4",
     "lines": [
      "-Efficiently unleash cross-media information without",
      "+Efficiently unleash sexed-up information without",
      "reviews": [
        "review": {
          "who": "paul",
          "when": "2013-12-07 14:58",
          "what": "Should use alternate to 'Sexed-up'; Too 'Malcolm Tucker'",
          "status": "REWORK-NEEDED"
        }
      ]
      "-maintain clicks-and-mortar solutions without",
      "+maintain empirical solutions without",
      " functional aspects."
     ]
    }
   ],
   "reviews": [
     "review": {
       "who": "paul",
       "when": "2013-12-07 14:58",
       "status": "REWORK-NEEDED"
     }
   ]
  }
 ],
 "rollup_status": "REWORK-NEEDED"
}

Designing an application around this idea

With very few lines of code, we could make a review system that works well enough for a “review-all-commits (on trunk) ASAP” agenda. We’d be open about what technology in the middle tier, but would suggest something minimal (Sinatra is my go-to):

Again we’re also agnostic about which back-end technology for ‘source-control’, with Subversion only listed in that diagram as it’s the most well-known choice. I’ve not even started coding this, by the way, though it would be fun.

Source-Control as a backend for reviews - really??

Yup, I generally think Source-Control is a viable store for some types of application, and there generally could be some convergence between source-control tools and key-val/doc stores.

Pros and Cons

Here’s some more reasons why Source-Control is suitable:

  • Diffs in JSON form are well-structured carriage-return delimited text files
  • Progressively overwriting the same doc over and over, if consistently pretty-printed, leads to a clean and authoritative audit trail
    • Doesn’t help with Sarbanes Oxley compliance directly, but will appease the regulators/customers mentioned at the top of the article.
  • A document (a review) is over written only a relatively small amount of time before it’s considered complete (no more overwrites).
  • Being able to access the data (and its history) without the middle-tier - directly from checkout - could be a good way to initiate batch analysis of the reviews by auditors, etc

It’s not all roses though:

  • The commit/review JSON as I have it contains all reviews. There should be consistency checking in the middle-tier to ensure that each previous review for the same commit isn’t changed somehow when latter reviewers weigh in with their review commentary
  • Not all Source-Control technologies are ready to be backends for arbitrary document storage, and the querying capabilities of these are mostly immature. Subversion, for one, is able to act as a WebDAV backend, and that might be nearly enough for essential GET / PUT interop that’s needed without stomping on things unintentionally. All have APIs to allow interop, but not all as easy to code for as each other
  • Source-Control can’t hold unreviewed_commits.json if that’s where the index is held
  • Two people concurrently reviewing the same commit, have a “who’s dominant / how to merge” problem if there’ as a middle-tier trying to orchestrate that (at least without semantic merge)
  • JSON diffs on binary files? - hmmmm.

And there are some Source-Control aspects that just don’t apply:

  • Commit messages for each stage of a single review, will be highly structured, whereas normally you detail rationale there
  • Branches within the review repository, make no sense

Other factors

  • There would need to be a commit-trigger for the main repo that ran the diff extract, turned it into JSON, pretty-printing it (jshon), and check it into the review repository (shown in the diagram)
  • You might need a way to recover items that were missed if the trigger is broken for some reason momentarily
  • You are still going to need a regular database for which items have not been reviewed yet, as shown in the diagram above
    • A way to rebuild that index will need to be created
    • You might get away with simple document located in the current directory seeing as it’s very small

Lastly, you’re not going to be able to maintain working copy on the server-side for each reviewer. Instead, you’ll need to get all documents from the Source-Control back-end whenever they are needed. In the AngularJS (or equivalent) web-app for this that’s not been discussed too much, most of the activity after serving up the ‘current’ state of the review is in the page itself. Only when a Save/Submit button is pressed does the JSON come back to the middle tier web-technology as a PUT, which then pretty-prints it, validates, and sends it on to the Source-Control tool managing the review repository.

Also Read..

Both the Hotel-Data and the App-Config-App deliberately blend a Web 1.0 templating tech (Sinatra) with a modern JavaScript MVC framework (Angular).