Paul Hammant's Blog: SCM and Key-Value (Document) Store Convergence
In this blog entry, I’m going to dwell on some important differences between SCM tools and Key-value stores that might narrow as work happens on these technologies in the next few years.
First, a reminder
Historically, Key-Value stores differ from Document stores. The latter allowed indexing by elements within the document, the former did not. Document stores were different to key-value stores in many more ways that understanding the nature of the payload. Things change though, and the distinction is small now. Consider Document stores and Key-Value stores as more or less the same for the rest of this article.
SCM tools, you could say, are like Key-Value stores in that there’s a key (the path to the resource) and the value (ordinarily the source file or resource). In many other ways there are differences:
SCM tools want to map the payload to a file in a file system, and deal with checkouts and changes to commit as sets. They’ll associate a reason for the commit if the end user types a message at the pertinent moment. They also have some rules around the nature of the key, in that it must conform to something meaningful in a directory-delimited file system.
Key-Value stores want to supply fetches to a running program (say ‘in memory’), and don’t require a change-message to come back with a commit. They are much more open about the nature of the key.
History is one big difference between Key-Value stores and SCM tools. SCM tools keep history without having to encode a version/revision number/hash in the key. History is available orthogonally for an item, with the default being ‘HEAD’ (latest).
The NoSQL page on Wikipedia says nothing about History/Revisions/Versions yet.
Key-Value stores are build for speed of access. If replicas or distributed deployment happens as part of an application-stack build-out, then the Key-Value store is going to push changes around quite quickly, and most likely ahead of need. SCM tools by contrast, are most likely to only do that if the ‘fetch’ cycle contains an implicit or explicit ‘refresh’ operation. They’re also likely to bring down more changes than just the resource being looked sought.
SCM tools can be made to be faster with a cache for the get/fetch cycle. You could add one if you understand the protocol, or are layering on top of the native protocol. You could say that the checkout to working copy, and a strategy to constantly refresh that could be fast, but having all items of a checkout local to the app that needs it could be inconvenient. For example, there are four million English Wikipedia articles, and you would not want them all on a iPhone, if that were the client. Not their minute-by-minute changes.
This I’ve labored in recent blog entries. SCM tools are typically good at having multiple branches of data that might have been identical at one point, and could be again if a merge happens. Typically that means text-based forms of data that stand a chance of being mergeable (JSON is better than XML for example). Maintained Divergence is also an SCM feature that allows two or more branches to keep their distance.
I see some of the NoSQL variants expand their tooling around history of documents. With that some brave soul will no-doubt try to make wrappers so that one can act as a formal SCM.
I’m not sure the enterprise SCM tool-makers will, but they could expand into the type of availability/scaling/consistency that a number of NoSQL vendors are delivering now. The FOSS vendors could move there more quickly. Deciding factors for various vendors: local-history (or not), read-only flags vs optimistic-locking, read vs write speed, client/server vs distributed. What’s nice about the SCM vendors getting involved is that the science is decades old, and the implementations are industrially hardened in some cases.
Feel free to ignore me though :)
Matthew Anslett of The 415 Group, published a tube-map style view of the range of choices for database/store things yesterday. It looks like the London underground (AKA ‘tube’) map a little.
I’m not sure that two-dimensions are enough to group implementations, yet also separate them. He is also not sure that this version is more appropriate than the his previous version six weeks ago or the one back in April of 2011. For example there was single rubber band “as a service” in the first version, that was three rubber bands in the version from last month, and is a green ‘line’ in the current version (again following the London Underground lines concepts).
Jan 2, 2013: This article was syndicated by DZone
Apr 2, 2017: Document stores and key/value stores are now more similar than they were historically, but SCM and document stores have not converged yet