For fun, and practice with bash scripting, I thought I’d see what it would look like to make a script to convert subversion repos to git. Mine does a fairly good job of converting the files in a trunk in thirty or so lines of code:

#!/bin/bash

if  [ ! -d .git ]; then echo "no .git folder - do 'git init'"; exit 10; fi
if  [ ! -d .svn ]; then echo "no .svn folder - checkout the trunk of some subversion repo"; exit 10; fi
[ ! -d svn_to_git_commits ] && mkdir svn_to_git_commits
echo -e ".svn\nsvn_to_git_commits\nsvn_to_git_revision.txt\nsvn_to_git_revisions.txt" > .gitignore
git add .gitignore > /dev/null

svn log | grep '^r[0-9]* ' | cut -d' ' -f 1 | cut -d'r' -f 2 | sort -n > svn_to_git_revisions.txt
prefix=$(svn info | grep "^Relative URL:" | sed 's/Relative URL: ^//' | sed 's#/trunk##')

while ((i++)); read -r rev; do
    trap "echo Exited!; exit;" SIGINT SIGTERM

    svn up --force -r $rev | sed '/^At revision/d' | sed '/^Updating /d' | sed '/^[AUD]  /d' | sed '/^ U/d' | sed '/^Updated to/d'
    svn log -v -r $rev > svn_to_git_revision.txt

    revisionLine=$(cat svn_to_git_revision.txt | grep '^r[0-9]* ')
    author=$(echo $revisionLine | cut -d'|' -f2 | sed 's/(no author)/none/' | cut -d' ' -f2 | sed "s/^$/none/")
    date=$(echo $revisionLine | cut -d'|' -f3 | cut -d'(' -f1)
    messageText=$(cat svn_to_git_revision.txt | awk '/^$/ {do_print=1} do_print==1 {print} NF==3 {do_print=0}' | sed '/------/d' | sed 's/\"/\\\"/g') 

    cat svn_to_git_revision.txt | sed "s/^ *//" | sed 's/(.*)$//' | sed "s/ *$//" | grep "${prefix}/trunk/" | sed "s#${prefix}/trunk/##" | sponge svn_to_git_revision.txt
    grep "^[AMR]" svn_to_git_revision.txt | cut -d' ' -f 2-99 | xargs -I {} git add "{}"
    grep "^D" svn_to_git_revision.txt | cut -d' ' -f 2-99 | xargs -I {} git rm -q --ignore-unmatch -r "{}"
    git commit --author "\"${author} <${author}@unsure>\"" --date "\"${date}\"" -m "Svn Rev: ${rev}.${messageText}" > svn_to_git_commits/"${rev}".txt

    echo "Svn revision ${rev} on $(echo $date | cut -d' ' -f 1,2)."    
    if [[ $(( i % 4000 )) == 0 ]]; then time -p sh -c 'git repack; git gc'; fi 
done < svn_to_git_revisions.txt
time -p sh -c 'git repack; git gc'
echo "ALL DONE WITH A GIT REPO SIZE OF $(du -h -d 0 .git | cut -f1)."

The above uses ack, sed, grep, and sponge from moreutils. Note: ack is ack-grep on Linux.

Timings: 12 mins to convert a repository that was ultimately only 4.4MB in size (the .git folder’s disk usage), but over a fairly slow connection.

Compare that to just over 2.75 mins for the the same repo with git-svn-clone - over four times faster. The git-svn way probably preserves more meta data on the commits, but the actual files for the final revision are identical for both versions. My script is just for trunks, and would need some tweaks to cover commits happening to branches. It already covers commits merging in to trunk from branches.

I don’t think there is anything that can be done to the script that could boost the speed more than a small percentage. I even tried Gnu parallel instead of xargs, but it blew up as git does not quietly wait lock for locks to be released during its operations. Besides, 8 mins alone is just spent doing “svn up” one revision at a time.



Published

February 14th, 2015
Reads:

Categories