Paul Hammant's Blog: Merkle Trees and Source Control
I announced SvnMerkleizer some days ago on Twitter. It adds a merkle tree capability to Subversion.
Why Subversion though, as Git has a history-retaining Merkle tree built-in. Well truth be told I started it years ago an wanted to finish it, and it was also a testbed for Servirtium that delivers Service Virtualization (SV) to Java clients of remote HTTP services.
There’s more reasons though.
In defense of Subversion for this Merkle tree thing
Size of repo
Subversion can go into terabytes quite easily, whereas Git has a hypothetical top limit. This is of history I
mean. For Git you’d use
--depth x to clone less history. With Subversion it’s implicit
--depth 1 history
at all times on the client side. Server-side in both cases keeps all history of course. Sure, Git-LFS
pushes Git into the place where it can handle video files more easily, but it’s not quite built-in.
Subversion can maintain read and write permissions for each directory. It can also group users together to make for terser config for that, even if the “Authz” technology is very confusing and error prone in the hands of novices.
Partial repo checkout.
With Subversion you can ‘svn co’ a subdirectory and that is all that comes down to your client (no parent directories).. Git does not have that - you have to clone the whole repo.
Git and Subversion have sparse checkout, which work slightly differently. Git’s is easier to use, I think.
Git does not have sparse clone, though which means that there would not be a saving on the
client’s storage for the
.git/ folder, even if the working copy modified as part of the
checkout operation is
reduced. Scratch that, Git gained sparse clone with Git 2.25.0 in Jan 2020 (update!).
Direct ‘update’ of specific files.
Hypothetically, Subversion can PUT to a single file resource in the repo, without having checked out anything before that.
Portals have added direct update access
Gitea, RhodeCode, and GitHub itself (after a fashion), have added the ability for you to effectively PUT a resource to a Git remote repo without first having cloned it. I’ve a few proofs of concept that utilize that:
- Using RhodeCode and Angular1 as an Editor for a ‘Config as Code’ System, 2016
- Custom JSON Editors for Github.com , 2015
- Perforce as a datastore, with Client-Side MVC, 2013.
Subversion does not have to be up to date, before committing back. Git needs you to pull (and resolve conflicts) before you push changes back.
Arbitrary branching models
Subversion allows you to make branches at any point in the directory tree, but that’s a really sharp knife that you can hurt yourself with. Each team placing their source in the same repo, could choose a different branching model and at any subdirectory that suits them. Perforce has the same arbitrary branching possibilities as Subversion. PlasticSCM which has Perforce-scale as a design goal, does not allow arbitrary branching. In that regard it is the same as Git and Mercurial, in that the branch is created and maintained at the root directory (whole repo). In a Monorepo configuration, nobody misses arbitrary branching.
Git’s lesser known strengths
Direct access to SHA1 representations of the Merkle tree
If you do
git checkout SOME_SHA1 after cloning, then Git takes you back in time to that moment in time quite quickly, regardless of
where which branch that may be on.
Even with SvnMerkleizer or similar, Subversion can’t give you direct access to tree with that SHA1 as the root SHA1 (as it effectively is in Git). You could wind Subversion back to a given revision (a numeric sequence), and then recalculate the whole merkle tree. With that you could find the one you’re looking for with trial and error.
Really, though Subversion would be better if if calculated the whole merkle tree with every commit, and made that long term accessible. New problem: the tree would be different for each set of permissions for users/groups.