Cherry-pick Asymmetry

Context is Trunnk-Based Development with Branch for Relase, and cherry picking. Trunk is main or master for may Git repos, of course.

A friend put a scenario to me and I gave him a strong answer: I said it was statistically impossible for the bug to still be on trunk once formal QA had signed it off on the release-branch-coupled CD environment. He raised an eyebrow. This post is me making good on that claim - and being honest about the tiny sliver where “impossible” is really “one-in-a-few-thousand,” and what that sliver actually is.

The scenario. You’re doing trunk-based development with a branch for release. Cherry-picks go one way only - trunk to release branch, never back. A bug is found. The fix goes onto trunk, gets a desk-check sign-off (someone stands the site up on localhost, clicks around, says “yep, fixed”), and then it’s cherry-picked to the release branch. From there Jenkins (or similar) continuous-deploys it into a coupled “release-candidate” environment, and a QA colleague does a formal sign-off against the bug in that environment. All that in addition to test automation that the same Jenkins performed for the commit(s).

So the same fix now lives on two branches, but with two very different levels of scrutiny:

Trunk: desk-check only. One developer, localhost, eyeballs, and automated tests appreciated.
Release branch: desk-check (inherited via the cherry-pick) plus a formal QA pass in a real deployed environment.

My friend’s question: what’s the statistical chance the bug is still present in trunk, given it’s been formally signed off as gone on the release branch? My answer: effectively zero, and not by luck - by construction.

Why it’s (almost) impossible, not merely unlikely

This is not a “did the fix get applied?” question. The fix is the same commit on both branches. A clean cherry-pick puts the identical diff on the release branch - and crucially, if the cherry-pick landed without a clash, trunk’s copy and release’s copy of the fix are the same bytes against compatible context. If those identical bytes pass formal QA on release, the very same bytes are sitting in trunk. There is no mechanism by which “this exact change works there but not here” can be true when there and here hold the same change. That’s the “impossible” - and it’s a structural fact about cherry-picking from trunk, not a probability. At least, it is when using a modern VCS with merge point tracking.

So the residual risk is not “the fix doesn’t work on trunk.” It’s two things, both of which are not the bug you fixed:

The formal check didn’t catch it. The sign-off said “fixed” but it wasn’t - a flaky repro, the environment happened to mask it, the wrong path got exercised. Then it’s not fixed anywhere, release included, and the premise “formal QA signed it off” carried less information than we assumed. Worth being careful here about why: this is rarely anyone slacking. The honest cause is usually scale. A manual gate that was statistically airtight for a small app and a small team gets asked to span more surface per release as the portfolio of change grows with the company - the same gate, now covering more. Holding the old bar at the new volume would have meant doubling or tripling QA headcount, which orgs almost never do; so the gap widens as an under-investment decision made above the team, not a failing within it. It’s a property of how the verification model was funded against growth, not of the branching model. (The CRT’s “started with manual QA for a smaller application and dev team” node is exactly this.)
Trunk has since become a different codebase - a later commit re-broke it, or the surrounding code the fix relies on drifted. That’s a new defect on trunk, not the one you fixed surviving.

Neither contradicts the claim. The specific bug, once its cherry-pick clears QA on release, cannot be the thing still broken on trunk. Put a number on the residual anyway, to give you a feel: for a well-resourced QA gate and a tight cherry-pick loop, you’re in 1-in-10,000 to 1-in-100,000 territory; for a gate stretched thin by a grown portfolio of change, or a long-lived branch where trunk has drifted hard, it climbs toward 1-in-1,000. Those are the odds of one of the two other failures above - never of the fixed bug itself reappearing on trunk.

The divergence mechanisms - i.e. the only ways to lose the bet

These are the only routes by which the formal release sign-off fails to cover trunk. Read them as ways trunk became a different codebase or QA was wrong - not as ways the fix failed to transfer.

1. Trunk moved after the desk-check, in the same region. The fix was desk-checked on trunk at commit X. By the time you cherry-picked, trunk was at X+n. If one of those n commits touched the same code the fix depends on - refactored the function, changed a caller, altered a shared helper - then trunk’s current behaviour is no longer what was desk-checked. The release branch is frozen-ish and got the clean cherry-pick; trunk kept moving. The release branch can be more correct than trunk precisely because it stopped changing. This is the big one, and it scales with your cherry-pick latency and your trunk commit rate.

2. The same hunks landed against different surrounding chunks. Covered in depth in the limits-of-merging experiment. A clean cherry-pick isn’t really a merge - it’s a two-way apply: the fix’s changed hunks go in, and the surrounding chunks are taken as-is, unchallenged. Nothing reconciles them. So the very same hunks can land against release’s surrounding code and against trunk’s surrounding code, and if those untouched chunks differ - because trunk drifted (mechanism #1) - the effective behaviour differs even though the patch “applied cleanly” on both. The fix’s lines are identical; the code they execute against is not. Clean apply is necessary, not sufficient. Low probability, nasty when it hits.

And here is the tell. When the surrounding code has drifted enough, the cherry-pick doesn’t apply cleanly - you get a merge clash. That clash is not an annoyance to be force-resolved at 4pm on release day; it is the system telling you the two branches have diverged in the fix’s blast radius. A clash is the loud, honest version of mechanism #2 - the same divergence that, when it’s just a hair smaller, applies “cleanly” and lies to you. So the clash is a gift: it converts a silent risk into a visible one. The mistake is treating a clash as friction rather than as intelligence. Which is the whole argument for a dry-run, below.

3. Trunk took a second commit that the release branch never got. Here’s the version of this that sounds paradoxical until you look at it on a timeline. Suppose trunk has two commits in play: the fix (cherry-picked to release) and a separate, later commit that re-breaks the same thing - a regression, or a new second instance of the same defect. The obvious objection: if QA passed in the release-coupled CD env, how can the bug still be live in trunk? And the answer is clean - because the two commits are not both on the release branch. Only the fix was cherry-picked. The release branch is fix-and-nothing-else: QA stands it up, drives the repro, signs off correctly. Trunk is fix-plus-regressor.

So how did the desk-check on trunk miss it? Because of when it ran. The desk-check happened at the fix commit - and at that moment trunk really was fixed, and the desk-checker was telling the truth. The regressor landed afterwards. The desk-check didn’t fail; it expired. Trunk is broken, release is green, both “passed” their respective checks, and there’s no contradiction - the checks ran at different points on a moving trunk, and only trunk kept moving.

Notice this is not a half-fix arriving on the release branch. The fix arrived whole and works there. The trouble is entirely on trunk, and it is just mechanism #1 (drift) with the clock made explicit: a human sign-off is true at one commit; trunk is a moving target; the release branch stopped moving and the desk-check didn’t get a do-over. That’s the whole answer to “can it be green on release and still broken in trunk” - yes, and only this way.

4. Environment-specific masking, in the other direction. The release-candidate environment is “coupled” and real; localhost is not. There are bugs that only manifest with the real environment’s data, config, or integrations. The formal QA pass in the RC environment can therefore prove the fix more thoroughly than the desk-check ever could - which means the desk-check on trunk may have signed off on a fix that the desk-checker was a little optimistic on. Trunk inherits a weaker proof. The bug could still be “present” on trunk in the sense that nobody actually verified it there under conditions that could expose it.

So what’s the number?

Remember what the number is the probability of: not “the fixed bug survived on trunk” (that’s the impossible part), but “the formal check didn’t catch it, or trunk quietly became a different codebase in the fix’s blast radius.” With that framing:

Tight loop, ‘on it’ QA team, thin slices (cherry-pick minutes-to-hours after the desk-check, short-lived release branch): you’re at roughly 1-in-100,000. The fix is the same bytes against near-identical context; QA passed; trunk hasn’t moved meaningfully. About as close to “impossible” as software gets.
Ordinary shop (cherry-pick same-day, busy trunk, decent-but-human QA): call it 1-in-10,000. Drift is small but nonzero; quality assurance is good but not perfect.
Long-lived release branch, hard-drifted trunk, a verification gate stretched past what it was sized for: it climbs toward 1-in-1,000, dominated by mechanism #1 (drift) - trunk became a different codebase while the branch sat still. Even here it’s trunk drift or a missed check doing the work, never the fixed bug reappearing - and the “missed check” end is a scale-and-staffing story (the portfolio grew, the gate didn’t), not a worse team.

There’s a fourth axis cutting across all three: how good is your test automation, and what shape is it? A healthy test pyramid - broad fast unit coverage, a solid integration band, a thin full-stack testing cap/peak - attacks both residual terms at once. It shrinks P(QA wrong) because the formal sign-off isn’t a lone manual gesture; the same behaviour is asserted by a unit test and an integration test that run on every commit, so a false negative has to fool all three. And it shrinks P(drift) because the moment a later trunk commit re-breaks the thing (mechanism #3) or shifts the surrounding code (mechanism #2), trunk’s own CI goes red - you find out in minutes, not in next quarter’s release. An inverted pyramid does the opposite: it pushes you toward the 1-in-1,000 end and makes the dry-run and the trunk regression test below far more valuable, because they’re compensating for coverage the pyramid should have provided. So read the three rows above as “× your pyramid”: a fat-pyramid ordinary shop behaves like the tight-loop row; an full stack heavy tight-loop dev shop behaves like the ordinary row.

The headline I’d give my friend, sharpened: a formal QA pass on the release branch proves the fix bytes work; because those exact bytes (and solidly indicated commit) are also in trunk, the only ways trunk can still be broken are a false QA pass or a later change to trunk - and neither is the bug you fixed. That’s why I called it statistically impossible and stand by it. The residual is a rounding error with three dials - QA bar, drift, and test-automation shape and a deep pyramid turns all three down at once.

Cadence is the master variable

Trunk and the release branch are strongly related - they share the entire history up to the branch point and only diverge afterwards. Every divergence mechanism above is really one quantity wearing different hats: how far has trunk drifted from the branch point by the time this cherry-pick happens? And that distance is governed almost entirely by release cadence. At least, planned release cadende more so than unplanned.

Compare two shops shipping the same software:

One release branch a year, ~500 cherry-picks (feeding, say, ~100 unplanned point releases off that one long-lived branch). By cherry-pick #400, trunk is months ahead of the branch point. The surrounding chunks (mechanism #2) have been rewritten, callers have moved (mechanism #1), the bug may have sprouted a second home (mechanism #3) that didn’t exist when the branch was cut. Every cherry-pick is fighting a trunk that’s drifted enormously. The release branch and trunk are technically still related, but the relationship is distant, and the release sign-off’s evidential value for trunk is correspondingly weak. This is the troublesome situation. The first time the merge engine says can’t merge without arbitration, all bets are off - the quantum link bwteeen the two testing places (two branches) is lost..
Twelve release branches a year, a few cherry-picks each. Each branch is only ever cut a few weeks before its cherry-picks land, so trunk has barely moved relative to the branch point. The same fix lands against near-identical surrounding code on both branches. Drift is small, so all four mechanisms shrink toward zero, and the formal sign-off on release transfers to trunk almost perfectly.

Same total release count, same total cherry-picks-ish - wildly different risk. The number of branches per year is a proxy for short distance between trunk and each branch, which is the thing that actually keeps the cherry-pick honest. This is the quantitative version of the advice in the limits-of-merging post: long-lived release branches that absorb selective fixes are a structural risk, and the structure is drift. Branch more often, keep each branch close to trunk, delete it fast.

But “branch more often” is easy to say and hard to do, and the reasons it’s hard are exactly the constraints I mapped in the software-development Current Reality Tree starter pack. A team can’t dial cadence up - can’t cut twelve short-lived branches a year instead of one long-lived one - until it has cleared the bottlenecks upstream of release frequency. Four nodes in that tree map straight onto this post’s dials:

Build times are too long (“we build too much per build”, “the full build compiles too much code”, “integration tests are too slow”). This one I didn’t call out above, but it’s load-bearing. Put a number on “too long”: say it’s two hours from dev-done to the change being stood up in a QA environment and tested - compile, full test suite, deploy to the coupled env, then QA can start. When that round trip is two hours, you can’t afford to cut and verify a fresh branch per release, so you keep one alive and let trunk drift away from it; and every dry-run and every re-test of a cherry-pick costs another two hours, so people do fewer of them. Slow builds force the long-lived-branch shape that maximises mechanism #1, and they tax the very mitigations (dry-run, re-test against trunk) that drain the residual. Get that round trip down to minutes and the whole calculus flips. This is the reduction-of-cycle-times argument wearing a different hat: those cycles nest like russian dolls, and your effective build time is the innermost one that sets a floor under all the others - cherry-pick latency and release cadence included. Shrink the build and every outer cycle can shrink; leave it at two hours and none of them can.
Not enough QA is automated (“over-reliance on manual testing”, “not enough QA automators”, “QA test suites not run that frequently”). This is the same fourth-axis point as the test pyramid above, seen from the constraint side: a thin or inverted pyramid is why the formal QA pass is a lone manual gesture, which is why P(QA wrong) doesn’t shrink.
The release process itself is error-prone / “we don’t practice releases safely” - no rehearsals. Rehearsal is a whole family: dress-rehearsal deploys to a prod-like environment, blue-green and canary cutovers practised before they’re needed, rollback and roll-forward drills, smoke-test runbooks, game-days. The dry-run cherry-pick below is just the cheapest, narrowest member of that family - it rehearses one question (does this fix apply cleanly against the branch?), not the deploy. But the muscle is the same: a team that never rehearses anything is a team that discovers the clash at 4pm on release day.
CI isn’t actually continuous (per commit), and the branching model is crappy. If trunk’s CI doesn’t run on every commit, mechanism #3 (a later regressor) goes undetected for longer; and a weak branching model is the soil all of this grows in.

So the residual probability in this post and the slow-cadence at the top of that CRT are the same phenomenon viewed at two altitudes. The cherry-pick asymmetry is what you measure once a fix is in flight; the CRT is the map of why the fix is in flight on a branch that’s drifted too far in the first place.

The genuinely counterintuitive bit

Why do we cherry-pick trunk → release exclusively in the first place? Not because we distrust the release branch - it’s about forgetting. If you let people fix on the release branch and rely on merging the fix back to trunk afterwards, you will eventually forget to do it. The fix ships, QA signs off, the client pops the champagne - and three months later the next release is cut from a trunk that never got the fix, and the bug walks straight back in. A regression, manufactured by a missed merge-back. Cherry-picking from trunk forces the fix to exist in trunk by construction: there’s no merge-back step to forget, so that class of regression is structurally impossible. The direction of the arrow is the safety mechanism.

But look at what that buys you on this one bug, at this one moment: the causality inverts. The release branch is now the better-verified artifact. It got the formal pass; trunk got an eyeball. The fix is definitely in trunk - that’s the impossible-to-be-broken part - but the formal evidence that it works lives on the release branch, not on trunk. Trunk has the cure and a weaker certificate; release has the same cure and the strong certificate. They share the fix; they don’t share the proof.

This is not an argument to start merging release back to trunk - that’s exactly the missed-merge-back regression trap the one-way rule exists to prevent (plus every merge-tracking hazard on top). The fix is in trunk already. What’s missing from trunk is the verification, and the cure for that is cheap:

Dry-run the cherry-pick early - well ahead of the moment of nerves. First, the honest caveat: most cherry-picks just apply, cleanly, every time - if that’s your experience, you don’t need this as standing ceremony, and a same-day cherry-pick is already most of a dry-run. This earns its keep only for teams with a track record of cherry-picks not being seamless - the long-lived-branch, hard-drifted-trunk shops at the 1-in-1,000 end, where a clash is a regular event rather than a rarity. Those teams should: long before you need it, do a throwaway cherry-pick (git cherry-pick --no-commit and throw it away, or a scratch branch) just to ask one question: does it clash? A clean apply is reassurance; a clash is intelligence - it’s mechanism #2 announcing that trunk and the release branch have diverged in the fix’s blast radius, which is precisely when a “clean” apply would have lied to you. Rehearsing the cherry-pick ahead of time converts a 4pm-on-Friday panic into a calm Tuesday decision. The clash is information you want, as early as you can get it.
If quick and cheap, run the same formal check against trunk, or at least the automated portion of it. If the bug is worth a formal QA sign-off on release, it’s worth an automated regression test that runs in trunk’s CI forever. Then trunk’s coverage of this bug stops decaying - it’s pinned by a test, and the 1-in-10,000 drops further.
Cherry-pick fast. Every hour between desk-check and sign-off is more trunk drift (mechanism #1). Short loops shrink the dominant residual term. Perhaps only a problem for Google-sized companies with select teams doing branch-for-release.
Keep the slices thin. A single-commit fix that can’t be partially-applied removes mechanism #3 entirely.

When is the second pass actually redundant?

This is the part my friend and I went around on. Frame it as sets: the desk-check covers a region D (localhost, one moment); the formal QA pass covers a region Q (coupled environment, one moment). The bug is a point. If Q ⊆ D - the release pass touches nothing the desk-check didn’t - then the second test adds zero bits and is pure ceremony. But Q and D only partially overlap: the coupled environment reaches behaviours localhost can’t (mechanism #4), so Q \ D is usually non-empty, and a manual pass is also a snapshot that decays the instant trunk moves (the desk-check “expires”).

What decides the size of Q \ D and whether it stays small isn’t the manual pass at all - it’s what automated coverage exists for this exact feature/bug, and what shape it is. Automation is what turns a coverage snapshot into a coverage invariant (re-asserted every commit, so b ∉ D keeps holding as trunk drifts). Fast, elegant, component-ish tests pin the behaviour cheaply and run every commit; slow full-stack tests pin it expensively and tend to get relegated to a non-continuous build, so they decay almost like the manual check does. So the real question - “do I need a second manual QA pass on this fix?” - splits six ways:

Automated coverage for this fix	Shape	Second manual QA pass?	Why
None	full-stack / slow (or n/a)	Yes - it’s the only check	Nothing pins the behaviour. Manual is your sole signal, on both branches. Maximum exposure to drift.
None	fast / component-ish	n/a	If there were fast component tests, you’d have automation - this row doesn’t exist.
Some	full-stack / slow	Yes, but scoped	A slow suite that may not run per-commit; manual still earns its keep for the `Q \ D` crescent (environment-specific behaviour).
Some	fast / component-ish	Rarely	The fast tests already re-assert the core behaviour every commit; manual only justified for genuinely environment-only effects.
Good	full-stack / slow	Redundant-ish	Behaviour is pinned, but if the suite isn’t continuous you still inherit some decay - a thin manual smoke is defensible, not required.
Good	fast / component-ish	Redundant	`b ∉ D` is re-proven on every commit and the cheap test is the invariant. The second manual pass adds no bits. This is the case where my “not needed” is flatly true.

Read top-to-bottom, the manual second pass goes from load-bearing to theatre exactly as automation deepens and gets faster/component-ish. The diagonal is the point: it’s not enough to have “good” tests - slow good tests that don’t run continuously still let coverage decay between the desk-check and release day. Fast, elegant, component-level tests are what actually collapse Q \ D toward empty and keep it there, which is the only regime where double-testing the fix is genuinely wasted motion. So when I say the double testing isn’t needed, the honest scope is the bottom row - and the engineering move for everyone above it is to climb toward that row, not to keep paying for the manual pass.

Closing

So, was I right to tell my friend it’s statistically impossible? Yes - with the precision that makes it true rather than glib. The fixed bug cannot be the thing still broken on trunk after the release-branch QA pass, because trunk holds the same fix bytes and the same bytes can’t pass quality assurance there and fail here. What remains is a sub-thousandth-to-sub-hundred-thousandth chance, and it isn’t the bug: it’s P(QA was wrong) + P(trunk became a different codebase since the fix). Drift and bad QA - not resurrection.

The cheap moves that drain even that residual: leave a regression test behind in trunk’s CI (this one always pays, for everyone), and - if your cherry-picks aren’t reliably seamless - dry-run the cherry-pick early so a clash informs you instead of ambushing you. The dry-run is for the hard-drift shops; for a tight-loop team whose cherry-picks just apply, the regression test is the move that matters and the fast same-day cherry-pick already does the rest. The dry-run, where it’s warranted, turns silent divergence into a visible signal before nerves are involved. And the regression test pins the fix for good: a human desk-check is true at one commit; a new unit test is true at every commit after it. Branch-for-release with one-way cherry-picks is a fine model precisely because the fix is already in trunk - the work left over is verification, and verification is cheap to automate.

Final question for the reader to answer: Change the full QA team sign off to the trunk (CI into “shared dev” say), and don’t do even the desk check in the release environment after the very procedural cherry pick … what’s the 1 in N failure rate?

← Previous Archive Next →

Published

June 22^nd, 2026

Reads:

Paul Hammant's Blog: Cherry-pick Asymmetry