OCR-to-Hash: A Simple Audit Trail for Physical Documents

I’ve been thinking about how we could use simple technology to create an auditable system for physical documents—receipts, certifications, attestations—anything printed on paper. The core idea: use OCR to extract text, normalize it, hash it, and verify authenticity through a simple URL lookup. This builds on some of my prior ideas around Merkle trees and transparent systems.

Till Receipts: A Starting Point

The Basic Concept

For receipts specifically, the process is straightforward: normalize the text content in a specific way, then use cryptographic hashing to verify authenticity.

Text Normalization Rules

To handle variations in OCR scanning, we normalize the text consistently:

Remove all leading spaces (left edge of each line)
Remove all trailing spaces (right edge of each line)
Collapse multiple spaces between words into a single space
Preserve blank lines as line breaks

This normalization is within the reach of regular iPhone/Android OCR—no AI needed.

Verification System

The SHA256 hash of the normalized receipt text would be appended to a URL printed at the bottom of the receipt. When you visit that URL:

200 response = receipt is verified (no payload needed)
404 response = receipt not found/invalid

Of course, this would require point of sale equipment to be upgraded to post each receipt text to some auditing backend after printing.

Advanced Features

Expense Filing Integration

A further development would allow the system to return a correlating URL for an individual’s or company’s expenditure filing system. This wouldn’t require a login - a “first to post” system (for that hash) would suffice.

The key benefit: An expense can’t be claimed twice.

Beyond Receipts: Product Certifications

While receipts demonstrate the concept well, the real power comes from applying this to higher-stakes documents like product certifications and safety attestations.

Real-World Fraud Case: Medpro and Fake CE Markers

[Oct 2025 addition]

This was after a Jim Pickford] tweet that originally led me into this whole topic I am writing.

The need for verifiable certifications isn’t hypothetical. In January 2023, it emerged that PPE Medpro (the Baroness Mone-linked company) had allegedly provided fraudulent safety documentation to the UK government. According to government legal filings, Medpro submitted a test report claiming it was performed by Intertek with the code SHAT06648491—but Intertek denied having issued this report. This revelation, shared by Financial Times journalist Jim Pickford, directly inspired the original tweets that led to this article.

The fraudulent documentation related to PPE supplied to the NHS during the COVID-19 pandemic, including items with fake “CE” safety markers. In 2024, the government successfully prosecuted Medpro, and the company was ordered to pay £122m over the PPE contract - BBC article.

The practical outcome highlights a key problem: while the government won the prosecution, they’re unlikely to recover the funds paid to Medpro, who in turn is unlikely to get a refund from the overseas manufacturer. This cascading accountability failure demonstrates exactly why we need verifiable certification systems—a hash-based verification system like the one proposed here would have caught the fraudulent Intertek certification immediately. If Intertek had hosted their genuine certifications at URLs like https://intertek.com/certifications/[hash], the fake report code SHAT06648491 could have been instantly verified as fraudulent, preventing the non-compliant PPE from entering the supply chain in the first place.

Medpro should have verified the CE markers themselves. They went on to receive 72 shipments from overseas and truck them into UK-GOV storage. Some time after the last shipment, UK gov checked the quality of the items and rejected the lot. Presumably they are in a landfill now, rather than used on any basis, or sold on again. That could have led to a discount (if still used), but UK-GOV was able to determine the certifications were fake, which would allow for a legal strategy toward a full refund.

Certification Verification Example

That Medpro and Intertek unverified CE mark…

The concept would work similarly:

Your iPhone or Android app snaps to the lines marking the bounds of hashable text on a product label
Apply OCR to extract the certification text
SHA256 hash the normalized text (e.g., 9936e14a1e86dea9fa2e17295bbf2cba567cc2338cf315b2620eaaa8168bfaf0)
Suffix it to the certifying agency’s URL: https://intertek.com/certifications/9936e14a1e86dea9fa2e17295bbf2cba567cc2338cf315b2620eaaa8168bfaf0

The web server response would indicate:

200 “OK” or “authentic” = certification verified
404 = certification not found
“DENIED” = certification explicitly rejected (with suggestion to retry in case of OCR errors)
“REVOKED” or redirect = product recalled

This is an NFT-less, QR-code-free mashup with old-world printed packaging. Going forward, “Audit Tech” needs to start resting on one-way hashes, with a verification path that starts with plain-text readable by humans and ends up on a certifying agency’s site.

Government Integration

Governments getting involved might require any or all parts of this system for their jurisdiction, adding another layer of accountability and fraud prevention.

Estonia’s KSI Blockchain: A Working Example

[Oct 2025 addition]

Estonia has actually been pioneering this approach since 2008, becoming the first nation state to deploy blockchain technology in production systems in 2012. Their Keyless Signature Infrastructure (KSI) blockchain system does exactly what this article proposes—it creates hash values from documents and uses those hashes to verify authenticity and detect tampering.

Here’s how it works:

Every public record (birth certificates, property titles, healthcare records, court documents) is converted into a hash value
These hashes form a Merkle tree structure where any modification to a single record changes the entire root hash
Each record gets a KSI signature containing its hash value, the path to the Merkle root, and a timestamp
Anyone can publicly verify a record’s integrity without needing private keys
The system provides mathematical proof that data hasn’t been tampered with—not by hackers, system administrators, or even the government itself

Estonia currently secures their healthcare, property, business, and succession registries using this technology, plus their digital court system and State Gazette. Since 2020, KSI blockchain is accredited as the first blockchain-based trust service under eIDAS regulation, giving it legal power throughout the EU.

A note on decentralization: Estonia’s KSI blockchain operates with only about ten nodes to agree on what gets entered onto the blockchain. This means there’s no distributed Byzantine consensus or arbitration between distrusting competitors—it’s a government-controlled system rather than a trustless distributed one. While this works for a nation-state securing its own records, the product certification use case proposed in this article would benefit from a more decentralized approach where competing certification agencies, manufacturers, and regulators could all participate as mutually-distrusting validators. That could easily be usage of super-scale systems like Hedera-Hashgraph.

Hedera cost economics: Hedera’s pricing model makes it practical for high-volume document verification. At current rates (as of 2023), inserting 1 million hash records into Hedera costs approximately $100-$200 using the Consensus Service or Token Service for timestamping and verification. This works out to roughly $0.0001-$0.0002 per hash insertion. While more expensive than centralized cloud databases, this buys you genuine decentralization with Byzantine fault-tolerant consensus across dozens of global nodes operated by organizations like Google, IBM, Boeing, and major universities—providing cryptographic proof that no single entity can tamper with the records.

Feeless alternatives: Networks like IOTA and NANO offer zero-cost transactions (ignoring the hardware, electricity, and administrative costs of running your own node). These systems use Proof-of-Work requirements and sophisticated prioritization algorithms to prevent spam attacks rather than transaction fees. However, both networks have faced real-world spam attacks—most notably NANO’s March 2021 incident where attackers created 14.5M fake accounts to congest the network. While defensive improvements have been deployed, it remains unclear whether feeless networks can indefinitely neutralize spam attacks as attack methodologies evolve. For high-stakes applications like safety certifications, this ongoing arms race between spam mitigation and attack vectors presents a potential reliability concern.

What makes this particularly relevant to the OCR-to-hash concept: Estonia’s system proves that hash-based document verification works at national scale. While they’re currently focused on digital documents in government databases, the same cryptographic principles could extend to physical documents like product certifications and receipts—exactly as proposed in this article

FAQ: Six Reflex Skeptic Questions and Answers:

Why not just use QR codes? They already exist and work fine.

Answer: QR codes can be copied/cloned easily, don’t provide cryptographic verification, and can’t be human-read for transparency (which was the whole point). This system works with existing text.
This is just blockchain hype. We don’t need another crypto solution.

Answer: This isn’t cryptocurrency - it’s using basic cryptographic hashing that predates blockchain. The Estonia example shows government-level implementation. No tokens, no mining, just verification.
OCR is unreliable. What happens when it misreads something?

Answer: The normalization rules handle common OCR variations. Failed verification just means retry the scan. Modern phones can manager OCR that is remarkably accurate for printed text within marked boundaries. And that tech just gets better and better.
Companies will never adopt this - it’s too expensive to change POS systems.

Answer: The Medpro fraud cost £122m. How many POS upgrades would that buy? Plus, early adopters get competitive advantage through trust. Implementation could be gradual anyway, unless governments pass adoption laws for certain classes od verifiable thing. Michelle Mone (Medpro) could have determined at shipment number 1 that Intertek had certified her garments, then cancelled 71 following shipments The UK government could have been advised sooner that this particular supply was proving problematic.
What stops someone from just hosting their own fake verification server?

Answer: The URLs are from known certification authorities (Intertek.com, etc.). You’d need to compromise their domain or DNS. That’s a much higher bar than forging paper certificates. At another level, that you got you PhD from Edinburgh University and that there’s a proof of that on phds.ed.ac.uk/awarded makes it all self evident to the eye. Maybe at phds.ed.ac.uk/index.html there’s a wordy paragraph or two on Edinburgh’s prowess, and a link to russellgroup.ac.uk/our-universities which has a listing for them.
This creates a privacy nightmare - every receipt tracked online!

Answer: The hash reveals nothing about the content. It’s one-way encryption. The server only confirms yes/no verification, not what was purchased. Less tracking than current loyalty cards.

Oct 2025 update

I’m working on a web-app for phone use: paul-hammant.github.io/verific. Source repo: github.com/paul-hammant/verific/

There’s target things to verify listed in this page: paul-hammant.github.io/verific/training-pages/

Want to help make this a thing? Pull requests accepted

Originally shared as a Jan 17, 2023 and January 7, 2023 on Twitter/X

← Previous Archive Next →

Published

January 17^th, 2023

Reads:

Paul Hammant's Blog: OCR-to-Hash: A Simple Audit Trail for Physical Documents