Paul Hammant's Blog: Dagging On Maven
Maven is over 15 years old now, and still a force in the Java build-technology world. I work with multiple build technologies and model their pros and cons quite a bit. Bazel is the new hoteness and people want to migrate to it. For a single workspace monorepo design (Google’s original use for themselves), Bazel is a directed graph build system. Bazel can be used in configurations that are not quite that way (multi-repo), but that’s not important for this blog entry.
Maven is a depth-first recursive model. You kick off Maven, it calculates an order of traversal then does compile & test for each module (and sub-module in order). Depth first recursive as I say - no circluar references allowed.
Life with maven, is:
- git pull
- mvn install
- (do your dev work)
- mvn test
- git commit+push to a PR branch
(repeating 3 & 4 until “done”)
If you have 1000 modules in your monorepo (and you don’t have a way of slimming that down for a development task), to get a faster experience then you might:
- git pull
- mvn install
- cd moduleName/otherModuleName
- (do your dev work)
- mvn test
- cd ../..
- mvn test
- git commit+push to a PR branch
Repeating 4 & 5 until “done”. That’s because compile/testing one module out of 1000 is faster. But just to be sure, you’re going to do a full build from the root of the workspace (6 & 7), before the commit/push. Or you’re brave/confident and choose to skip 6 & 7. Or you’re going to rely on some Jenkins-like infra to inform that you broke something (after the commit/push). That last isn’t a good team habit, I say.
Making Maven more DAG-like
Taking github.com/jooby-project/jooby as a test bed (large multi-maven-module monorepo with lots of tests), it is possible to benchmark some modes of operation.
Assuming maven has already filled ~/.m2/repository
with deps the mvn clean install
build time on my older Mac is 7.5 mins.
If I run mvn install
after that (no “clean” target), then the build is just under 3 mins.
If I make a inconsequential change to a source file in one module’s test-base - modules/jooby-thymeleaf/src/main/java/io/jooby/internal/thymeleaf/ThymeleafTemplateEngine.java
and run mvn install
from root again, it is still just under 3 mins.
If I cd into modules/jooby-thymeleaf
and run mvn install
there, it takes 5 seconds to complete. To be correct, I’d still cd../..
and run mvn install
again before the commit+push.
Some Python build fu
#!/bin/python3
import sh, os
from pathlib import Path
log = sh.git.log("--oneline", "--no-color", "--decorate=short", _tty_out=False)
branch = str(sh.git("rev-parse", "--abbrev-ref", "HEAD", _tty_out=False)).rstrip()
hashLine = ""
for line in log.split("\n"):
#print("Line:" + line)
if " (origin/{0},".format(branch) in line:
hashLine = line
break
if "{0}, origin/{0}".format(branch) in line:
hashLine = line
break
if "{0}, origin/HEAD, origin/{0}".format(branch) in line:
hashLine = line
break
if "{0}, tag: ".format(branch) in line and ", origin/{0}".format(branch) in line:
hashLine = line
break
if (len(hashLine) == 0):
print("Could not determine origin/{0} SHA1".format(branch))
exit(1)
originHash = hashLine.split(" ")[0]
diffFiles = sh.git.diff("--name-status", originHash, _tty_out=False)
allModules = []
for filename in Path('.').glob('**/pom.xml'):
allModules.append(str(filename).split("/pom.xml")[0])
allModulesWithSourceTrees = []
for filename in Path('.').glob('**/src/main'):
srcDir = str(filename).split("/src/main")[0]
if srcDir in allModules and srcDir not in allModulesWithSourceTrees:
allModulesWithSourceTrees.append(srcDir)
impactedProdModules = []
impactedTestModules = []
for diffFile in diffFiles:
file = diffFile.split("\t")[1].strip()
for m in allModulesWithSourceTrees:
if file.startswith(m) and "src/test/" not in file and m not in impactedProdModules:
impactedProdModules.append(m)
impactedTestModules.append(m)
if file.startswith(m) and "src/test/" in file and m not in impactedTestModules:
impactedTestModules.append(m)
if len(impactedProdModules) == 0 and len(impactedTestModules) == 0:
print("No local changes, nothing to run.")
quit(10)
with open(".mvnCommands.sh", 'w') as out:
out.write("#!/bin/sh\n\nset -e\n\n")
impactedProdModuleList = ",".join(impactedProdModules)
print("Impacted Maven Prod modules = " + impactedProdModuleList)
out.write("mvn install -DskipTests -pl " + impactedProdModuleList + "\n\n")
impactedTestModuleList = ",".join(impactedTestModules)
print("Impacted Maven Test modules = " + impactedTestModuleList)
out.write("mvn test -Dmaven.main.skip -pl " + impactedTestModuleList + "\n\n")
os.chmod(".mvnCommands.sh", 0o775)
print(".mvnCommands.sh updated")
OK, so I run that python:
python3 path/to/genMavenImpactScript.py
That takes 1.3 seconds to run. It processes all the modules in this multi-maven-module workspace, looking for changes. It draws up the smallest list of things that need to be compiled and things that need to be tested. It makes a shell script from that (.mvnCommands.sh
)
If I execute that it takes 15 seconds to build all impacted and run tests that need to be run. That’s for this Jooby monorepo and that this silly change to ThymeleafTemplateEngine.java
. In this case the generated .mvnCommands.sh
script looks like:
#!/bin/sh
set -e
mvn install -DskipTests -pl modules/jooby-thymeleaf
mvn test -Dmaven.main.skip -pl modules/jooby-thymeleaf
The script works out what you’ve done in your workspace since the last git-push. It runs some git commands to do that. It adapts to the branch name you’re using.
I’m sure there’s plenty of imperfections. It’s also true that there’s a proper maven module that does the same, but I’ve forgotten it’s name and had done all this work before someone told me it existed. Never mind, I wanted to publish this script.
Note too that “dagging” is aussie slang: en.wikipedia.org/wiki/Dag_(slang)#:~:text=Dagging