I’m wishing for a rise in the use of formal source/version control for non source-code usages, and I see signs that it is happening slowly.

I have been fascinated with source-control (where items stored within are primarily textual) for some time. When I say source control, others would say version-control, and I would too, but only where that stored within it is primarily binary. I’m not a fan of proprietary version control though. You know, like where a wiki or a CMS has versions of artifacts, and a mechanism to navigate through their history and potentially compare revisions. What I don’t get is the ability to check them out as a set (or at a specific prior moment in time), operate on some or all of them, then check them in again (as a set and atomically). What I like is for tools that purporting to support the versioning of artifacts allow me to configure my own formal version control backend. I also like such applications to use that facility as the primary store for its artifacts. Indeed I don’t like anything that would store in a relational schema, where the item in question would be better suited to a version-control system.

I look at the likes of Mongo and similar document / key-value stores, and smile. They are so close. I love their advanced document indexing and querying. What I’d love more, is the ability to work on sets of documents after a checkout. There’s 30 years of PhD thinking that’s pushed the science of version-control (and merging and branching) that I think would have a happy home boosting the functionality of the document & key-value stores available today.

A hypothetical example

From the command line (of course):

hamgo checkout from inbox where content contains "the rise of VCS" and sender endswith @thoughtworks.com

I might want to do traditional grep-style unix work on that set after checking them out, to fuel my wish to crunch data and produce stats. I might also like to mutation to some of the documents, and do a commit afterwards:

cd inbox
perl -p -i -e 's/TAGS[/TAGS[TW_VCS /g' *.email
hamgo commit -am "tag emails from TWers talking to me or mail-lists about my rise of VCS topic"
hamgo push
cd ..
rm -rf inbox # delete the checkout/clone thingy

Incidentally, do I think email should be under source control? Perhaps not, though there could be benefits to modifying your own emails. Deletes is an obvious function, with the safety of history, moving the idea from “nuts” to “dubious”. In a previous article I’ve wished for a pervasive inbox which called for rewriting of emails on the server side, amongst other things.

Note, Mongo does have a shell (but not the syntax I showed above, or with a checkout/clone capability).

More things for formal source control

  • Publishing / Content management systems, as previously mentioned.
  • Internal collaborative document systems - SnirtLabs has enterprise-ready solutions for that.
  • Issue trackers / Story management tools - documents for issues/stories/tasks, that tend towards completion.

Document stores for .doc, .ppt (etc) will have to handle binary formats, and the version-control system backing them be good at smart deltas for those file types. Story/issues, if represented as text are suitable, but for those the need for branching is very weak (see immediately below for “very small data”).

Further reading

I published before about SCM and Key-Value Store Convergence (2012).

There’s also Nearly All CMS Technologies Suck (2014) where I break down why, for CMS platforms in particular.

Very Small Data (2012) goes into a bunch of reasons as to why/when source-control generally.


December 8th, 2014