For a VM tool on Linux called virsh I have been using a virt-install helper with a bunch of parameters to go make a VMs for me. Virsh works on Ubuntu and Chromebook which are important to me presently (LXC is not an easy setup for Chromebooks). With virt-install, there’s a bunch of follow-up installer Q&A that happens as the VM is being instantiated: “US keyboard mapping, or something else”. You can script those too but not as command line options for virt-install. Instead they’re in a “preseed config” of many such choices and one arg to the command being invoked. I thought I’d make a tool to turn those back into command line options. Silly really, but it gave me an excuse to play with Zig.

Zig source to do this thing (preseed.zig, not finshed yet)

const std = @import("std");

pub fn main() !void {
    const allocator = std.heap.page_allocator;
    const args = try std.process.argsAlloc(allocator);
    defer allocator.free(args);

    // Use ArrayList to dynamically manage options
    var options = std.ArrayList([]const u8).init(allocator);
    defer options.deinit();

    // Ignore the first argument (program name), process each subsequent argument
    for (args[1..]) |arg| {
        if (std.mem.startsWith(u8, arg, "--")) {
            // This is an option, strip the initial "--" and store it
            const option = arg[2..]; // Skip the leading "--"
            try options.append(option);
        }
    }

    // Create and open the file
    const file_path = "/tmp/preseed.cfg";
    const file = try std.fs.cwd().createFile(file_path, .{ .truncate = true });
    defer file.close();

    // Process each option
    for (options.items) |arg| {
        var split = std.mem.split(u8, arg, "="); // Changed 'const' to 'var'
        const key = split.next().?;
        const value = split.next().?;

        // Write to file
        try file.writer().print("{s} {s}\n", .{ key, value });
    }

    std.log.info("{s}", .{file_path});

(not finished yet, as I say)

Very simple. The zig build file (build.zig):

const std = @import("std");

const targets: []const std.Target.Query = &.{
    // Main specified targets
    .{ .cpu_arch = .aarch64, .os_tag = .freestanding },
    .{ .cpu_arch = .aarch64, .os_tag = .linux },
    .{ .cpu_arch = .aarch64, .os_tag = .macos },
    .{ .cpu_arch = .aarch64, .os_tag = .windows, .abi = .gnu },
    .{ .cpu_arch = .aarch64, .os_tag = .windows, .abi = .msvc },
    .{ .cpu_arch = .arm, .os_tag = .freestanding },
    .{ .cpu_arch = .arm, .os_tag = .linux, .abi = .gnueabihf },
    .{ .cpu_arch = .mips, .os_tag = .freestanding },
    .{ .cpu_arch = .mips, .os_tag = .linux },
    .{ .cpu_arch = .powerpc, .os_tag = .freestanding },
    .{ .cpu_arch = .powerpc, .os_tag = .linux },
    .{ .cpu_arch = .powerpc64, .os_tag = .freestanding },
    .{ .cpu_arch = .powerpc64, .os_tag = .linux },
    .{ .cpu_arch = .riscv64, .os_tag = .freestanding },
    .{ .cpu_arch = .riscv64, .os_tag = .linux },
    .{ .cpu_arch = .sparc64, .os_tag = .freestanding },
    .{ .cpu_arch = .sparc64, .os_tag = .linux },
    .{ .cpu_arch = .x86_64, .os_tag = .linux, .abi = .gnu },
    .{ .cpu_arch = .x86_64, .os_tag = .linux, .abi = .musl },
    .{ .cpu_arch = .x86_64, .os_tag = .windows, .abi = .gnu },  // MinGW-w64 ABI
    .{ .cpu_arch = .x86_64, .os_tag = .windows, .abi = .msvc }, // MSVC ABI
    .{ .cpu_arch = .x86_64, .os_tag = .windows}, // MSVC ABI

};

pub fn build(b: *std.Build) !void {
    for (targets) |t| {
        const exe = b.addExecutable(.{
            .name = "preseed",
            .root_source_file = .{ .path = "preseed.zig" },
            .target = b.resolveTargetQuery(t),
            .optimize = .ReleaseSafe,
        });

        const target_output = b.addInstallArtifact(exe, .{
            .dest_dir = .{
                .override = .{
                    .custom = try t.zigTriple(b.allocator),
                },
            },
        });

        b.getInstallStep().dependOn(&target_output.step);
    }
}

Installing Zig (Linux)

curl -L https://ziglang.org/builds/zig-linux-x86_64-0.13.0-dev.75+5c9eb4081.tar.xz | tar -xJ -C ./
ln -s zig-linux-x86_64-0.13.0-dev.75+5c9eb4081/zig zig

Zig runs relative, it seems. In my opinion that Should be a unix principle.

In a single invocation, it can build a number of OS/CPU targets:

./zig build --summary all

What we made took less than 60 seconds:

$ du -h zig-out 
8.0K    zig-out/powerpc-freestanding
1.7M    zig-out/x86_64-linux-musl
1.4M    zig-out/aarch64-windows-msvc
1.7M    zig-out/powerpc64-linux
8.0K    zig-out/sparc64-freestanding
12K     zig-out/mips-freestanding
1.7M    zig-out/mips-linux
1.7M    zig-out/x86_64-linux-gnu
1.5M    zig-out/arm-linux-gnueabihf
8.0K    zig-out/aarch64-freestanding
1.4M    zig-out/aarch64-windows-gnu
8.0K    zig-out/powerpc64-freestanding
8.0K    zig-out/arm-freestanding
1.8M    zig-out/aarch64-linux
2.4M    zig-out/riscv64-linux
1.8M    zig-out/x86_64-windows-gnu
1.8M    zig-out/x86_64-windows-msvc
12K     zig-out/riscv64-freestanding
1.8M    zig-out/x86_64-windows
1.5M    zig-out/powerpc-linux
220K    zig-out/aarch64-macos
1.6M    zig-out/sparc64-linux
24M     zig-out

There’s some wide variance there, and I can’t say I understand it all. The biggest for each CPU architecture:

1.8M    x86_64
1.8M    aarch64
1.5M    powerpc
1.7M    powerpc64
2.4M    riscv64
1.6M    sparc64
1.7M    mips
1.5M    arm

13M total.

How well does that 24M compress?

$ zip -r preseed-binaries.zip zig-out
  adding: zig-out/powerpc-freestanding/preseed (deflated 50%)
  adding: zig-out/x86_64-linux-musl/preseed (deflated 72%)
  adding: zig-out/aarch64-windows-msvc/preseed.exe (deflated 71%)
  adding: zig-out/aarch64-windows-msvc/preseed.pdb (deflated 74%)
  adding: zig-out/powerpc64-linux/preseed (deflated 73%)
  adding: zig-out/sparc64-freestanding/preseed (deflated 53%)
  adding: zig-out/mips-freestanding/preseed (deflated 62%)
  adding: zig-out/mips-linux/preseed (deflated 69%)
  adding: zig-out/x86_64-linux-gnu/preseed (deflated 72%)
  adding: zig-out/arm-linux-gnueabihf/preseed (deflated 66%)
  adding: zig-out/aarch64-freestanding/preseed (deflated 52%)
  adding: zig-out/aarch64-windows-gnu/preseed.exe (deflated 70%)
  adding: zig-out/aarch64-windows-gnu/preseed.pdb (deflated 73%)
  adding: zig-out/powerpc64-freestanding/preseed (deflated 54%)
  adding: zig-out/arm-freestanding/preseed (deflated 49%)
  adding: zig-out/aarch64-linux/preseed (deflated 73%)
  adding: zig-out/riscv64-linux/preseed (deflated 80%)
  adding: zig-out/x86_64-windows-gnu/preseed.exe (deflated 72%)
  adding: zig-out/x86_64-windows-gnu/preseed.pdb (deflated 73%)
  adding: zig-out/x86_64-windows-msvc/preseed.exe (deflated 72%)
  adding: zig-out/x86_64-windows-msvc/preseed.pdb (deflated 73%)
  adding: zig-out/riscv64-freestanding/preseed (deflated 68%)
  adding: zig-out/x86_64-windows/preseed.exe (deflated 72%)
  adding: zig-out/x86_64-windows/preseed.pdb (deflated 73%)
  adding: zig-out/powerpc-linux/preseed (deflated 67%)
  adding: zig-out/aarch64-macos/preseed (deflated 58%)
  adding: zig-out/sparc64-linux/preseed (deflated 72%)

The result of that is 6.M which is half the size of the 13M (sun of the biggest of each cpu architecture) and much less than the 24M of the uncompressed sum of all permutations.

Why run that test? What’s this blog entry all about?

I don’t think Zig has an online package choice system yet, and federated Git could be it - hear me out…

Storing binaries those in Git would have the same compression. Those giy repos publishd would have an easy syndication mechanism and allow for cache in intermediate places. One git branch for each architecture combination like x86_64-windows-msvc and a tag within for the release v1.2.3. As vulnerabilities are found and old releases are recommended as not to be used anymore, a vulnerabilities.yaml file at HEAD revision could be updated with some form of use_this_version_instead indication. I think you would not have that Git repo double duty for day to day developer source changes (and short-lived branches), but an adjacent one just for package distribution. Nobody would fork that repo, cos there’s no pull-requests back to it.

GitHub might not like you publishing a Git repo of just binaries (where diff isn’t human visible and merge makes no sense). In 2017 I wrote about a hypothetical use of Git for Java-land binary package publication: Alternative to Maven Central for Jar publishing (multiple Git repositories), which did push non-source to GitHub. Also see Git-backed Maven Central Meta Model post from a few years back, that’s a halfway house.

Of package repositories online that we have today: CPAN (Comprehensive Perl Archive Network), PyPI (Python Package Index), RubyGems, npm (Node Package Manager), Maven-style repositories (including “Maven Central” for Java language family), NuGet (.NET family of languages), Cargo (Rust’s package manager) Go Modules, CRAN (Comprehensive R Archive Network), CocoaPods trunk (iOS/macOS), Flutter & Dart’s “Pub”, Hackage (Haskell), Swift Package Manager, PHP’s Composer all use HTTP to get packages to a directory based layout on the local dev workstation. Commands ‘cd’ and ‘ls’ work when stepping around the local cached copies of those binaries.

Those were all for languages or language families. Similarly for non-language technologies that need a repo there are: TensorFlow Hub, Terraform Registry, and even CTAN (Comprehensive TeX Archive Network) that also use HTTP for GETing of binary bits and pieces.

Three that are not resting on HTTP/GET, are notable:

  1. Carthage: Carthage is a decentralized dependency manager for Cocoa projects. It fetches and builds dependencies via Git. When a project specifies a dependency in Carthage, it typically points to a Git repository. Carthage then clones the repository and builds the project if necessary. The cloning process uses Git’s protocols, which means HTTPS or SSH, depending on the repository’s URL provided in the Cartfile. This first hit the scene in 2014.
  2. Ansible Galaxy: Ansible Galaxy is essentially a hub for sharing Ansible roles and collections. It allows users to download roles directly or use the ansible-galaxy command-line tool to manage these roles. While the Ansible Galaxy website itself operates over HTTPS, when it comes to fetching roles or collections specified in a requirements file, it uses Git if the source is a Git repository. This means it can use either HTTPS or SSH for Git repositories, depending on the URL format specified in the requirements file.
  3. Docker Hub: Docker Hub is a service for sharing and managing Docker images. While Docker images themselves are not Git repositories, Dockerfiles and source code for building Docker images can be hosted on Git repositories. The Docker build process can fetch base images from Docker Hub using HTTPS. However, if a Dockerfile includes instructions to clone a Git repository (e.g., as part of the build process), then the Git protocol used (HTTPS or SSH) will depend on the repository URL specified in the Dockerfile.

Custom acquisition of binaries or layers of a binary is part of these three. Literal Git operations for two (which could include over SSH instead of over HTTP). In the case of Docker, there’s no ‘ls’ or ‘cd’ to step around or investigate the local representation of what was pulled down via your build or package manager commands. Docker give you sub commands for much of what you’d want to do with Docker acquired images though.

So if Zig wanted to utilize thousands of federate Git repos to store published versions of binaries made from Zig source, it could do so and not be the first. Git would allow a subscription model to track releases, and the subscribers could easily pick the branch that was pertinent to them and pull just that (say x86_64-linux-musl). Mirroring is possible. Corporate intermediate caches are easy to setup. Git uses SHA256 for a bunch of things and can be reasonably thought of as a history-retaining merkle tree. Merkel trees, for other reasons, I blog/tweet about quote a lot. There’s a bunch of available Git hosting techs to do this - Gitea is an elegant easy-install example.



Published

April 12th, 2024
Reads: