I have a book on TBD which covers this stuff too: tbd-book.com
See also, my book, Value Stream Mapping for Software Delivery Teams, which doesn't cover this stuff but is very interesting: vsm-book.com
Talk Roadmap
Directed Acyclic Graph build systems
Google's Piper and rationale
Simulation Project on GH with contrived app
Talk through various build & test scenarios
Build tech aware sparse-checkout capability
Potvin & Levenberg's monorepo paper refers to this as subsetting on their "clients in the cloud" (CiTC)
Depth-first recursive build systems
Same sim repo, different (divergent) branch
Talk through various build & test scenarios
Discussion / Pros & Cons
Google's Piper VCS & build system
1000's of applications and services comprising 9+ million source files in one
branch (trunk) of the monorepo, horizontally scaled to support 25K+
committers with 90TB+ of history
Clone of all that to one workstation isn't possible
Even if it was, the checkout alone would kill your IDE
Devs+QEs use a tool for smart subsetting and a
build-system that works well with that (Bazel)
Their monorepo approach allows Google to make atomic changes
across multiple projects simultaneously,
maintain consistent dependencies,
and very effectively share code at source level across their entire engineering organization.
Important: Dependency management this way allows lock-step
upgrades without lock-step releases, and enable lower-drama
large-scale refactoring
Directed Acyclic Graph build systems
Google style
Modules & Classes Used in the Simulation Repo
Two main() applications with some components in common
Instead of Bazel's BUILD files, modules have .compile.sh
and maybe .dist.sh or .tests.sh scripts.
A module dependency declaration:
# From javatests/components/vowels/.tests.sh source file
deps=(
"module:java/components/vowels"
)
☝ Tests for components/vowels depend on prod code for the same to compile and run
Dependencies can be binary too
An example binary dependency declaration:
# From javatests/components/vowels/.tests.sh source file
bindeps=(
"lib:java/junit/junit.jar"
"lib:java/hamcrest/hamcrest.jar"
)
If you were using Git for your corporate monorepo, instead of a Piper-alike,
you'd likely be using Git Large File Storage (LFS) to keep the binaries out
of the clonable (history-retaining) .git/ folder
Compile of one application and its deps and make jar
Each letter of the app's class name prints to stdout (in the class name of the c-tor).
Vowels do that too, but wrapped in parentheses by the Rust code (via Jav native Interface - JNI)
Tests for that app (and compile of deps)
./javatests/applications/monorepos_rule/.tests.sh
Note: Tests and prod-code are single "participants" in this seq diagram
Maybe you'd fast fail for very quick 'unit' ones first:
for typ in unit service ui; do
./javatests/applications/monorepos_rule/.tests.sh -tdt -$typ
done
Monorepo Tests (continued)
Or something more sophisticated:
set -e
cd javatests/applications/monorepos_rule
.tests.sh -tdt -unit && .tests.sh -tdt -service \
&& .tests.sh -ui
(Confession: I have not coded the 'type' switch)
Google had unit/service/ui mapped to small/medium/large (link to article by Mike Bland)
Sparse checkout in a Google-style monorepo
I've made a cut-down representation of Google's tech with crude shells scripts
This tech hides directories that are not pertinent to the thing you're working on. For example monorerpos_rule app does not need you to have directed_graphs_build_systems_are_cool in your checkout, does not need a bunch of the unreferenced components
<Rust compilation output>
rust compile for components-vowel-base
java compile for components-vowel-base
java tests compile for components-vowel-base
test for components-vowel-base
jar for components-vowel-base
copy-resources for components-vowel-base
java compile for components-vowels
java tests compile for components-vowels
test for components-vowels
jar for components-vowels
java compile for components-nasal
java tests compile for components-nasal
test for components-nasal
jar for components-nasal
java compile for components-voiceless
java tests compile for components-voiceless
test for components-voiceless
jar for components-voiceless
java compile for components-sonorants
java tests compile for components-sonorants
test for components-sonorants
jar for components-sonorants
java compile for components-fricatives
java tests compile for components-fricatives
test for components-fricatives
jar for components-fricatives
java compile for monorepos-rule
java tests compile for monorepos-rule
test for monorepos-rule
jar for monorepos-rule
uberJar for monorepos-rule
Maven's reactor picks the order upfront - with fast fail
Maven Reactor Sequence Diagram
Discussion
Depth-first recursive versus Direct acyclic graph build systems.
Limits of the sparse checkout tooling I made. Sparse capabilities in depth-first
recursive build systems
My sparse-build tech doesn't have
A generated reverse map: "what uses java/components/vowels?"
Query interface
Add those to checkout via single "add depending" script invocation
Map could be in the Git repo ... but probably shouldn't be.
Google talks of a "Kythe" code indexing tech that maybe has these features.
Fast fail scripts for a commit-hook to use: "your sparse-checkout should have /ts_tests/foo/bar/baz and 12 others as you have changes in /ts/foo/banana"
Shell scripts vs Bazel
✓ Easier to understand with no prior-experience
✓ Zero install situation
✗ Without a purposeful build language (starlark), it is easier to
make an inconsistent mess
✗ Some repetition of lines (not DRY - oops)
✗ Forget about fine-grained dep tracking, hermetic builds
with reproducible outputs, distributed build cache that
works across machines, remote execution and parallel building,
better handling of cross-language deps
DAG advantages over depth-first
Lockstep upgrades of modules and binary deps - restating that doesn't also mean lockstep releases
Many other claimed advantages could be implemented as features for the
depth-first build systems if they wished, including the sparse-checkout
tricks and compatibility
DAG build system disadvantages
Some classes of problem discovered late in execution
?
Monorepo Build Systems Or rather: Google's directed-graph build system for monorepos with special sparse-checkout features versus classic depth-first recursive types. Fewer than 100 companies worldwide should consider the same setup. For all others, this talk is just a curiosity.