Better practices for audits

Software development teams quest for continuous deployment these days. Some of those want to get there and to also pass audits they may be subject to. Audits can be from a firm that the company appoints (to the agreement of their clients), or by a state or federal overlord.

If the CD-questing team is using Git, has achieved continuous review, with CI kicking in for every commit that lands in trunk/master (or small batch of commits), and has a policy of all commits going green or they get automatically rolled back, then they have the perfect setup for audit-ready software.

Merkle Trees

All this rests on Git being a perfect history tracking Merkle tree for directories of source files over time. Git makes SHA1 hashes of the versions of source files and directories landing in a branch (‘master’ is the trunk equivalent in Git for Trunk-Based Development teams). If all commits that are in master are good (if not rolled back), and that means that all the tests of “the build” passed, then you have something that is hypothetically reproducible in an audit. Specifically, in an audit cycle, the auditor says “show me the test evidence for commit 25b23067de4004db8b4fcb2ad23f0dac16105b89” and an experienced developer at a dev workstation rolls back the source tree to that commit, and runs “the build” to the same outcome. Most likely the auditor has some development skills too, so they can smile their way past small build snaufus like “I need to reinstall JDK 8, it won’t take long”.

Git’s Merkle-tree implementation is so perfect (provided force-commits are turned off) that even the commit messages are subject to same tamper-evident design.

Works with any branching model

This model for audits works for Trunk-Based Development with:

Branch for Release for slower cadences
Release from tag (on trunk/master) for faster cadences
Release from commit (on trunk/master) for faster cadences

It also works for other branching models provided that the commit passing/failing CI is noted somewhere non-repudiable and the agreed audit process details the process of release and the policy in source-control for what is releasable.

The Merkle tree aspect to this proposition is so mathematically solid that it would cost upwards of a million dollars to adequately manipulate the history in Git to the level where the SHA1 has reported to auditors still exists. Google made a proof of concept collision for SHA1 hashing. The technique is more than what you’d need to insert a bogus file in the history to correct a parent SHA1 that in turn would allow the SHA1 of the build that went live to remain the same after the tampering, but the effort to back calculate what that should be could well be higher (and could masquerade as a JPEG for example) Git should perhaps change to SHA256.

Continuous Review Again

Continuous Review was pioneered by Google internally in their ‘Mondrian’ system, delivered for the masses by GitHub from its 2008 launch. If you’re using GitHub (the public portal or the enterprise app that you install on-premises), it stores the code reviews in a Postgres database that you don’t have access to. Well, you do for the on-prem install (in an appliance) but you have to reverse engineer the schema in order to tamper with the record. It would be better if the reviews themselves were in source control.

I’ve made the case for code reviews in source-control to the GitHub people, but couldn’t convince them. You’d get to take advantage of the same Merkle tree concepts to be able to slide smoothly into the audit cycle with a single SHA1 as the representation of the last code review commentary that made it to that release. If in the same repo/branch then the SHA1 would be for the source code itself and the reviews the same. Yes, as single SHA1 again.

The way it used to be

It used to be that test evidence was stored somewhere non-repudiable. If your tests were always green (passing) then you were always redundantly storing evidence (build logs, test reports, videos). If your tests were not always green then I maintain that you should not be going live with that build. I’m often asked what percentage of test failing is acceptable and the answer is always zero. In ThoughtWorks I never got to see a situation where the team was not collectively pushing for 100% passing, so it a bit of a shock after ThoughtWorks to encounter teams that thought that 80% was good (provided an expert interpreted them daily).

The old way was a lot of bullshit, and vulnerable to dozens of ways to fake evidence/record. Companies used to rest on the fact that the software that was used to attest the record towards an audit was approved somehow. That it has passed some scrutiny and was able to be relied on in the audit cycle without the auditor scoffing at it. In reality, evidence was fakeable if needs be. The costs for which could be $1,000 to $20,000 depending on what was being tampered with after the event.

Git’s internal implementation

These two blog entries details some of the Merkle-tree implementation of Git.

Demystifying Git internals, Pawan Rawal (2016)
Some of Git internals, Dennis Yurichev (2015)

The first doesn’t mention the Merkle-tree aspect, but the latter does. Neither mentions Blockchain either (thank Turing). I say that as there is way too much presenting of Merkle-tree solutions as Blockchain in the current age.

2026 Update: Merkle Trees for Financial Compliance Audits

Eight years on, the same Merkle tree principle applies beyond software development — to financial compliance evidence.

I’ve been working on Live Verify (source), a system where government agencies and regulated institutions publish SHA-256 hashes for documents they issue (OFSI sanctions licences, NCA suspicious activity report receipts, HMRC trust registrations, bank statements). A browser extension or headless tool can verify any document against its issuer’s hash endpoint in seconds.

The audit model I had in mind in 2018 — “show me the evidence for this commit and I can prove it hasn’t been tampered with” — maps directly onto a regulatory audit scenario:

An FCA supervisor arrives at a fund manager’s office with a small NUC that boots from USB (no internal drive — the lid pops off to prove it).
The fund manager’s compliance team provides a USB stick containing all their evidence: OFSI licence PDFs, SAR receipt acknowledgments, bank statements, KYC packages.
A headless batch tool verifies every document against its issuer’s hash endpoint — not a sample of five out of two hundred, but all of them.
The tool then builds a Merkle tree over every file on the evidence USB — exactly the same data structure Git uses internally. One root SHA-256 commits to the exact contents of every document presented.
The supervisor writes the root hash down in biro (or captures it with Live Text from the screen). That 64-character hex string is the tamper-evident anchor for the entire evidence set. Seven years later, recompute the Merkle root from the preserved USB and compare.

The 2018 argument was: Git’s Merkle tree makes software audit evidence tamper-evident for essentially zero cost. The 2026 extension is: the same structure makes financial compliance evidence tamper-evident for essentially zero cost. Change one byte in one document, and the root hash changes.

What I didn’t predict in 2018 was the anti-evasion declaration — the fund manager signs a statement that none of the evidence contains steganographic content, zero-width Unicode characters, or homoglyph substitutions designed to interfere with automated verification. This transforms Unicode evasion from a technical cat-and-mouse game into a compliance offence. The tool still detects anomalies, but their meaning shifts from “possible evasion” to “breach of audit declaration.”

The Merkle tree was always the right answer for tamper-evident audit records. It just took eight years for the use case to expand from source code to sanctions licences.

May 2026 Update: Back to Software — ISO 26262 Functional-Safety Evidence

Reddit post: Our ISO 26262 documentation work is eating more engineerin

The arc closes where it started. A Tier-1 automotive supplier’s compliance lead posted the numbers from a release cycle: 187 engineering hours running the tests across their instrument cluster, IVI and HVAC subsystems, against 312 hours assembling the ISO 26262 traceability documentation around those runs. Four cycles, same 60/40 doc-to-testing ratio. AI codegen made it worse — more code per release, more evidence to assemble. Their tooling is fragmented and none of it produces the compliance package: Polarion for requirements, Polyspace for static analysis, Vector CANoe for bench runs, qTest for test management, in-house Python to aggregate. A human still stitches it together.

I want to be honest about what the Merkle-tree idea does and does not do for him, because it’s the same honesty I owed the 2018 reader. It does not cut the 312 hours. Authoring the safety case — correlating each safety requirement to its test to its result, writing the coverage narrative, closing the gaps — is an ALM problem that belongs to Polarion and qTest, and they should keep doing it. The Merkle tree solves the next problem, the one the 2018 post was always really about: once the evidence exists, it should be independently verifiable and tamper-evident for essentially zero cost.

It maps cleanly onto the 2018 model:

The Tier-1 ships the OEM an evidence package — the requirements matrix, the Polyspace runs, the CANoe results, the test-management records.
Each artifact carries a Live Verify line bound to the supplier’s domain (verify:contitech.com/fusa/v). The OEM’s functional-safety team runs a headless batch verifier over every artifact — not a sample of five out of two hundred — confirming each is the supplier’s genuine, current record. Scrub a coverage count from 478/482 to 482/482, downgrade an ASIL B item to ASIL A, bury an unresolved Polyspace finding — the hash changes and the scrub shows.
The evidence names the Git commit of the source baseline it pertains to. That’s the original 2018 argument intact: the commit SHA is the root of Git’s Merkle tree over the source, so the evidence is bound to an immutable code state. “Roll the tree back to this commit and the build reproduces” becomes “this evidence campaign covers exactly this baseline, provably.”
The tool builds a Merkle root over the entire evidence package — the same structure as the evidence-USB in the financial example above. One 64-character hex string commits to every artifact delivered. The OEM records it at acceptance; a TÜV assessor or, after a field failure, a product-liability litigator recomputes it years later from the preserved package and compares.

So the lineage runs: source code (2018) → sanctions licences (2026) → ISO 26262 test evidence (also 2026). Three instances of one structure. The doc-to-testing ratio is the supplier’s to fix with better authoring tools; the trust in the evidence those tools produce, and the cross-org handoff to the OEM, is what the Merkle tree has always made free. I wrote this up as a Live Verify use case — framed, deliberately, as complementary to the ALM suite rather than a replacement for it.

← Previous Archive Next →

Published

May 30^th, 2018

Reads:

Paul Hammant's Blog: Better practices for audits