Paul Hammant's Blog: The limits of merging experiment
A colleague suggested the cherry-picking way for trunk-based-development with branch for release needs worked examples cos there are foot-guns.
Cherry-pick to a release branch isn’t crystal clear as a workflow
The failure modes are more interesting than they look. To pick at it, I built a small playground - a single-file Sinatra CRUD app, an end-to-end Playwright test, and a series of trunk commits that would let me run the same scenario two different ways and see what git actually does.
The repo is here: paul-hammant/limits-of-merging-experiment.
Clone it, run ./start.sh to make solution/ folder the git folder not the one you cloned.
The run bundle install to go get deps.
The setup
Folder solution/ contains the ruby/sinatra app and a playwright test for it. It is three tier - html with JS, a ruby middle tier, and a sqlite base tier. reset.sh will
keep getting us back to a starting point. See “C1” below.
Five hypothetical trunk commits, all reachable as patches in patches/ and re-applicable on demand, are key to this experiment.
| Change | Touches | |
|---|---|---|
| C1 | Initial Person CRUD app, seeded Flintstones, Playwright happy-path test | everything |
| C2 | Add hair_color (string) - dropdown, JS validation, DB CHECK constraint |
app.rb, happy_path_test.rb |
| C3 | Button text change to UPPERCASE: NEW PERSON / EDIT / DELETE / SAVE / CANCEL | app.rb, happy_path_test.rb |
| C4 | hair_color becomes INTEGER (1..6); dropdown values are now ints |
app.rb, happy_path_test.rb |
| C5 | Add maintainer comment in header (cosmetic, previously untouched region) | app.rb |
The shape that matters for this experiment: C3 is cosmetic and unrelated to hair colour. C4 builds on C2. C5 is in an untouched corner. This is normal trunk life - small unrelated changes interleaving in the same files.
The hypothetical release branch is cut from C2. The team wishes that was it for the release, and continues on an unfrozen trunk as normal. Later there’s something that agreed as a bug fix (definately not feature creep) and it should be cherry picked to the release branch by the responsible “merge meister” or release engineer. We want to ship bugfix C4 (the int conversion) but not C3 feature change (the uppercase buttons).
Scenario 1: git am is honest, and that’s why it fails
First release engineer instinct: apply the C4 patch directly to “stable” release branch.
$ git checkout -b release c2
$ git am patches/c4-hair-color-int.patch
Applying: C4: hair_color stored as INTEGER (1..6), dropdown values become ints
error: patch failed: app.rb:133
error: app.rb: patch does not apply
Loud failure. Why? Look at the failing hunk:
<td><%= h p['dob'] %></td>
- <td><%= h p['hair_color'] %></td>
+ <td><%= h(HAIR_COLORS[p['hair_color']]) %></td>
<td class="row-actions">
<a class="button" href="/people/<%= p['id'] %>/edit">EDIT</a>
The patch’s context lines (the unchanged EDIT reference around the change) include C3’s uppercase text. On the release branch the buttons still say Edit. Context doesn’t match. git am is strict about context - it refuses.
This is git am correctly doing its job. It’s not the failure mode I’m interested in.
Scenario 2: git cherry-pick is helpful, and that’s why it lies
(we do a ./reset.sh to go back to the starting position)
Second instinct, and what most developers actually type:
$ git checkout -b release c2
$ git cherry-pick c4
Auto-merging app.rb
[release ...] C4: hair_color stored as INTEGER (1..6), dropdown values become ints
Clean. No conflict. Test passes.
Why? Because git cherry-pick doesn’t apply patches by context - it does a three-way merge with the parent of C4 as the merge base. The merger sees:
- merge base (C3 on trunk) had
EDIT - the cherry-pick target (release branch) has
Edit - C4 didn’t change either of those lines
So three-way merge correctly concludes “leave the case alone, just apply the hair-colour change.” Release ends up with C4’s diff cleanly applied on top of Edit.
That’s the textbook good outcome. And it’s exactly the failure mode the colleague was warning about - not because this cherry-pick was wrong, but because the success was contingent on a property of the diffs that nobody checked. C3 happened to touch only button text. If C3 had ever-so-slightly tidied up the dropdown the cherry pick may have conflicted - forcing a human to arbitrate on it.
The lies are little lies, from good intentions, perhaps.
What about merge-point tracking?
Here’s where Subversion fans get nostalgic. SVN’s svn:mergeinfo tries to record on the release branch “I have integrated revisions r3, r4 from trunk.”
A subsequent sweep merge of trunk into release knows to skip those. Not just Subversion, but the “bigger” VCS technologies Perforce and Microsoft’s TFVC.
Git has none of that. A cherry-pick produces a commit with a different SHA from the original - git cat-file -p on the two reveals different parents,
different trees, different hashes. The only audit trail is the optional (cherry picked from commit ...) line that git cherry-pick -x leaves in the
commit message, and git itself does not consult that line for anything. It’s a comment.
So if you later merge trunk wholly into release, git’s merge base is the last common ancestor - which is C2, not C4. Git will re-apply C3, C4, and C5 from trunk on top of release. If the cherry-picked C4 on release is byte-identical to trunk’s C4, the merger deduplicates silently. If they differ even slightly (a hotfix on release, a typo correction), you get spurious conflicts that look very real, with no way for git to say “you already integrated this, just take trunk’s copy.” Coming from Svn, Perforce or TFVC merge-point-tracking is not as you remember it. In a TBD + release branches workflow you would never do a sweeping commit from trunk to the release branch - you would only do cherry picks and less and less so over time to that release branch. At some point the release branch has been superceded and eleigible for deletion.
SVN’s mergeinfo aimed at this and got bitten by edge cases - subtree mergeinfo creep, properties getting out of sync if you bypass svn merge. The
“slightly broken” reputation is earned. But the intent - cross-branch awareness of what’s been integrated - is something git deliberately doesn’t have.
Linus rejected it on simplicity grounds. The price is paid by anyone running long-lived release branches.
The order of cherry-picks question
If C3 and C4 are eventually both wanted on the release branch, does the order matter? Two scenarios:
Scenario A - out of order. Cherry-pick C4 first (the urgent fix), then C3 later (because someone decided uppercase buttons should ship after all):
release branch cherry-picks: C2 then C4' then C3'
Scenario B - in order. Cherry-pick in trunk order:
release branch cherry-picks: C2 then C3' then C4'
In both cases, trunk has C2, C3, and C4. We could then sweep-merge c5 from trunk into the release branch.
Resulting tree hashes (with author/date/identity pinned so SHAs are deterministic):
| Tree hash after sweep merge | Merge commit SHA | |
|---|---|---|
| Scenario A (out of order) | 088b679... |
e96a8e0... |
| Scenario B (in order) | 088b679... |
4ad3efc... (fast-forward!) |
Tree hashes match. Content is byte-identical. Empty git diff between the two release branches.
But the commit graphs are different shapes. Scenario A produced a real merge commit with two parents - the cherry-picked C4’/C3’ on release and the trunk C5 - because the histories diverged. Scenario B’s cherry-picks produced commits byte-identical to trunk’s (same diffs, same pinned timestamps), so the sweep merge fast-forwarded; release’s tip is main’s tip.
So the answer to “does order matter” is layered:
- Content: no, both converge to the same source tree.
- History shape: yes, you get a merge node in one and a flat history in the other.
- SHA equivalence: no, never - parent chains differ, so commit SHAs cascade differently. SHA equality was never the right test for “did this work.”
What the SHA actually proves
This is the bit I had to stop and think about. I’d been comparing commit SHAs and getting confused by the differences. The right framing:
Commit SHA equality means “byte-identical commit object including parents.” Tree hash equality means “byte-identical content.” Cherry-picks change parents. They cannot preserve commit SHAs. They can preserve tree hashes - and that’s the only equivalence that matters for correctness.
So when judging whether a cherry-pick + sweep merge produced the right result: diff the trees, not the commits. A clean git diff main release after the sweep is the
only proof you need.
What git can’t tell you
Putting it together, here is what git silently cannot answer for a release branch built from cherry-picks:
- “Have I already integrated this trunk commit?” Git’s answer is “no” - even if you cherry-picked it. The DAG has no link after the even (ignoring formatted comments).
- “When I sweep merge runk to release, will it be a no-op?” Only if every cherry-picked patch is byte-identical to its trunk twin. There’s no machine check.
- “Are these two release branches functionally equivalent?” Only
git diffof trees can tell you. Commit history is misleading. - “Did this cherry-pick land safely?” Only your automated tests can tell you. Three-way merge succeeding is necessary, not sufficient.
The real risk in real corporate codebases
A common pushback: “in a million-line corporate codebase, two unrelated commits almost never touch the same file region - cherry-picks land cleanly the vast majority of the time.” That’s empirically true. The base rate of textual collision is low.
But the question isn’t frequency, it’s severity when it does happen. The classic bad outcome isn’t a noisy git am rejection - it’s a quiet git cherry-pick
that three-way-merges into wrong-but-plausible code, ships to a release branch, passes your tests because the tests don’t cover the exact corner that broke,
and surfaces in production a week later. Git gives you no warning. There’s no git fsck --semantic.
The mitigations I keep coming back to:
- Work in thin vertical slices. This is a good idea generally, but it especially pays off when cherry-picks are in your future. A single commit/PR that changes the DB schema, the middle tier, and the UI together is one cherry-pick - either it all lands on the release branch or none of it does. Split the same bug fix across three commits (one per tier) and you now have three cherry-picks that each have to be remembered, ordered, and applied. Miss one and you ship a half-fix; the release branch compiles, the smoke test passes, and the bug is “fixed” everywhere except the layer you forgot. Your pre-commit automated tests on the release branch should catch the omission - but “should” is doing heavy lifting there, and the failure mode is exactly the kind of partial-state subtlety tests are weakest at.
- Test the release branch like it’s a fresh codebase, not “trunk minus a few commits.” End-to-end, not just the change you cherry-picked.
- Cherry-picks of schema/data changes need extra scrutiny. Migration logic written assuming a trunk DB state may break against a release DB state.
- Prefer release-from-trunk over release-with-cherry-picks when your cadence allows it - roughly weekly or faster. If you ship every few days, a release branch buys you very little and the cherry-pick overhead isn’t worth it; just tag trunk. Cherry-picked release branches earn their keep at monthly/quarterly cadences where the stabilization window is long enough that trunk has moved on substantially. Long-lived release branches that absorb selective fixes are a structural risk, not a tooling problem git is going to grow out of.
- For high-cost cherry-picks, run the full test suite on the cherry-picked branch before merge. This is what the playground demonstrates: a happy-path Selenium/Cypress/Playwright test that hits the UI and asserts on the DB shows up regressions that
git diffwon’t.
Reproducing this
git clone https://github.com/paul-hammant/limits-of-merging-experiment
cd limits-of-merging-experiment
./start.sh # set up solution/ as a fresh playground
./scenario-a-out-of-order.sh # release: C2, C4, C3, then merge main
./rollback.sh
./scenario-b-in-order.sh # release: C2, C3, C4, then merge main
Each script prints the resulting graph, tree hash, and diff. Compare the two.
The patches in patches/ are real git format-patch output - readable, replayable, and the source of truth for the trunk timeline. The scenario scripts pin author identity and timestamps so anyone running them gets the same SHAs I did. That’s the only way to make a cherry-pick experiment reproducible.
Closing
Cherry-pick is not infallible. It also isn’t usually wrong. The hazard is the gap between those two facts: git’s vocabulary for “this is the same change” is “this is byte-identical bytes,” and outside of that narrow case, it has no opinion. SVN tried to fill that gap and is imperfect in it implementation. Git decided the mess wasn’t worth it.
What this means in practice for trunk-based development with release branches: cherry-pick is a tool that requires you to bring your own audit. The audit is your test suite, your code review, and your CI. If those are weak, cherry-pick is a foot-gun. If they’re strong, cherry-pick is fine - and the SHA divergence between trunk and release is just bookkeeping noise.
So per my colleagues nudge - cherry picks are great, but be careful and know the limits.
One last thing: I’ve been in engineering leadership previously for a 12 planned releases a year team, and sometimes late features were merged to the release branch using cherry-pick. Don’t do that - wait for the next release. I lost the argument cos the business really really wanted those late features - more than once from the for the same component of the system. Chery pick is for release stabilizing things only: toggle off your work that isn’t ready to go live before the branch cut moment.