Home

Converting Subversion Repositories to Git...and Back Again

Posted on 2009-11-10 by Curt Sampson :: comments enabled

Introduction

I’ve been contemplating moving some of my projects stored in Subversion to Git, as much because I think Gitosis is incredibly well done as anything else. But, because I tend to worry about major changes such as this, I want to be able to go back to Subversion for any particular projects where Git isn’t working out. This might happen, for example, because a client has decided that he really hates Git.

This document describes how to do this.1 Using this method, you can move back and forth between Git and Subversion as often as you like, and even use both systems at the same time to update checkouts. The key restriction is that you must be sending your commits from only one of the two systems at any one time: while it’s possible for clients to update from either the Subversion repository or the Git master repository and receive all of the latest changes, committers must use only one of the following two methods:

  1. via Subversion protocol (whether the client be svn or git svn) to the Subversion repository; or

  2. via Git to the master Git repository.

There’s a bit of work involved on the administrator’s part when you want to switch from one to the other. If you interleave commits without doing this work, you’ll end up with a mess that’s going to be painful to clean up.2

Authors File Configuration

You’ll probably want to set up an authors file, as described in the git-svn manpage or in this blog entry about Subversion to Git conversion. To make my life easy, I set up a standard authors file and I use it for all of my conversions. During conversion, when git svn fetch fails due to an author name not being found, you can just update the file and start another fetch; it will resume where it left off.

One note: in versions of Git up to at least 1.6.5.1, there’s a bug in the use of the svn.authorsfile configuration property (at least when set in $HOME/.gitconfig): it’s honoured by git svn fetch, but not git svn clone, even though a clone is just an init followed by a fetch. So if you use this, you’ll need to do a separate init and fetch.

Terminology and Layout

I’m assuming that your initial configuration is like mine: one or more Subversion repositories each with several projects in it, each project having a trunk and possibly several branches and tags. I convert each project in my Subversion repository to a separate Git repository, to match my typical “unit of checkout,” as it were. Remember, with Git you can clone and have a working copy only of a full repository, and (at least with Gitosis and most Git tools) you cannot manage access control in less than repository-size chunks. I do import all of my Subversion branches for a particular project into the same Git repository as the trunk. While I need independent control of users’ read and write access for different projects, I’ve never had to give a user separate access rights to different branches within one project.

The tags, however, are a special case, because Subversion’s “tags” are actually branches used under the convention that you never commit to them. These can be converted to either branches or tags in Git. I’m not going to discuss this further here, though, as I don’t have any tags in the projects I’m converting. If someone else cares to do a write-up on this, send me an e-mail and I’ll add a link to it here. If you do have tags, and need to maintain them, I’d suggest you just convert them as Git branches to maintain them using the Subversion convention, and add Git tags later, if necessary.

So what we’re going to do when we convert a project is create a new Git repository which I will refer to as the “master repository”. Commits moving from Subversion to Git during the conversion process will be committed to the Git master repository, and, should you decide to convert back the other way, only commits in the Git master repository will be brought back into the Subversion repository. This Git master repository will be a bare repository compatible with the standard Gitosis layout.

Exporting from Subversion to Git

To copy your commits from the Subversion repository to the Git master repository, we first use the git svn init command.

If you have no branches or tags, or if you’re using the standard Subversion layout, you can simply provide the Subversion URL to your project as the command line argument to init, or provide a --trunk=url option:

git svn init \
    svn+ssh://repo.example.com/home/repo/client/project \
    gitproject

Note that the URL you use must work from the location where you’re going to place the final Git master repository. Thus, even when doing a conversion on the same machine that has the Subversion repository, I use an svn+ssh: URL for this.

I use a non-standard layout where my trunk, branches and tags are all kept under the project directory, so that the paths in the Subversion repository look like this:

/client/project/trunk
/client/project/branch/foo
/client/project/branch/bar
/client/project/tag/baz

If you do this sort of thing, you’ll need to use the --branches and --tags options to init. There’s a bug here, too. The documentation claims that these options (and --trunk) take full URLs, but they don’t unless you’re using just --trunk alone. So if you use these options, you want to specify the common prefix of the path as the command-line argument, and the subdirectories under those with options:

git svn init \
    --trunk=trunk --branches=branch --tags=tag \
    svn+ssh://repo.example.com/home/repo/client/project \
    gitproject

If you’ve not done so already, at this point you want to set up the authors file as described above and either set your svn.authorsfile git configuration option or use the --authors-file option on the following fetch command.

You can now just cd into the gitproject directory and do a git svn fetch to do the work of copying the commits. This can take many minutes or even tens of minutes, depending on the size of your project, the speed of your disk and network connection, and so on.

Preparing the Master Repo

At this point you’ve got a repository that includes a working copy in your gitrepo directory. This needs to be converted to a bare repository that you will use as your master repository. As I write this, this isn’t difficult: just move the gitrepo/.git subdirectory to the appropriate place.3

But this isn’t guaranteed to work in the long run because the way Git stores repository information may change. Unfortunately, at this time there’s no “proper” method we can use to do this. The Git FAQ suggests that you use git clone, but that won’t work in this situation because the clone will be lacking the information that git svn uses to push Git commits back into the Subversion repository.

I’ll try to update this post if Git changes such that just moving the gitrepo/.git directory to a new name and location doesn’t work. But if you’re reading this long after it’s been written, be wary, and check for yourself that the renaming technique is still valid.

At any rate, for the moment we can just use mv to make the change. Move that .git directory to the location where you want your Git master repository, under an appropriate name, and you’re set.

If you’re using Gitosis, the bare repo would go under the repositories directory, e.g.:

mv gitproject/.git /home/git/repositories/project.git

Before you do this, or at least before you give access to this new Git master repository to others, you may want to:

  1. Change the permissions on your Subversion repository to be read-only for all users by adding or modifying the section for that project in the conf/authz file in the repository directory;

  2. Do another fetch to catch any last commits that snuck in before you did this. Remember, it’s vital that you never have commits in both repositories that are not in the other at the same time.

Dealing With Branches

At this point your developers now have the ability to clone the master repository, and, once cloned, switch to any of the branches imported from Subversion using the standard Git syntax:

git fetch git@repo.example.com:project.git remotes/branchname:branchname

However, there’s one hitch: how do your users find out what the branch names are? They can’t see them in the output of git branch -a in their cloned repository, because the remote tracking branches in the master repository weren’t copied. (The remote branches they see in that list are the local branches in the master repository.)

If they have direct (i.e., via a filesystem) access to the master repository, the solution is simple; cd to that directory and run git branch -r. Note that the -a output may be confusing, especially if you’re not using coloured output, because though the Subversion tracking branches are remote branches, they do not have the usual remote/ prefix.

However, those who have access only via the http or git protocols (including Gitosis users) cannot do this. The best solution I can think of in this case is to have someone who does have such access run git branch -r and commit the output of it as a file in the repository, perhaps called /Branches.4

Updating the Subversion Repository

If you’ve reached this point, you’ve now switched to Git. Congratulations!

However, even if you never switch back, it may be useful to keep your Subversion repository up to date with the new commits coming in to your Git master repository. This is simple enough to do with a git svn dcommit, but there’s a catch: you can’t use this from a bare repository. (dcommit requires a clean working copy for its operations.)

But creating a repository with a working copy from a bare repository is quite simple, at least at this time and given all of the caveats listed above in “Preparing the Master Repo.”

Create a new directory elsewhere on your filesystem, say, gitproject, and copy the bare repository into the .git subdirectory into that. A git reset --hard in that directory will make the working copy files re-appear, and then a git svn dcommit will push the new commits back into the Subversion repository.

If you’re doing this on a regular basis in order to keep a read-only copy of the Subversion repository available, you’ll probably want to use rsync to quickly synchronize the bare repository with gitproject/.git before you do the dcommit. You may also be able to skip the reset --hard step; I’ve not checked to see what dcommit does in these circumstances.

Going Back (and Forth)

If you decide you want to revert a project back to using Subversion as the master repository, set the permissions for your Git master repository to read-only, perform the procedure above, and set the permissions for your Subversion repository read-write.

Moving yet again from Subversion to Git, or maintaining a read-only copy of the repository in Git after you’ve switched back, is merely a repetition of this whole procedure. You’ll need to throw out your current Git master repository and do a new export, but this will give you all of the same Git commit IDs for the subversion commits, and I believe that existing Git users will be able to continue to update their current checkouts, albeit possibly with some use of git rebase. However, I’m not sure if I’ve confirmed all of the details of this, and after close to 2000 words I’m getting a bit tired, so whether your Git users will have to re-clone or not, I’ll leave for another day.


  1. Stephen J. Turnbull very kindly helped me out with much of the research for this article.

  2. It appears, actually, not to be too difficult to recover from the situation where you have both commits in the Subversion repository not present in the Git master repository, and vice versa, but though I have an idea of how to do this, I’ve not actually tried it.

  3. In theory, you’re also supposed to git --git-dir=repo.git config core.bare true, but I’ve not found this to be necessary in practice.

  4. This is just what we used to do back in the days of CVS, where on large repositories it could be quite expensive to find all of the branch names. Plus ça change….


Comments

By MALAISE Pascal on 2010-02-26 16:49:03 +0100:

congratulation for your interesting and useful article.

Maybe I can propose a solution to your question: «However, there’s one hitch: how do your users find out what the branch names are?» It is to re-create the branches in the master repository. The following bash scrips does it.

# Regenerate branches and tags
for branch in `git branch -r` ; do
  if [ "${branch##tags/}" != "${branch}" ] ; then
    # This is a tag
    git tag "${branch##tags/}" remotes/$branch
  elif [ "$branch" != "trunk" ] ; then
    # This is not a tag nor trunk => make a branch
    git branch $branch remotes/$branch
  fi
done

In my case I want to import once and forever a SVN repository in a GIT one, then work with git. Because it is a one-way import I prefer converting the SVN tags as GIT tags, but the script can easily be simplified just to skip trunk and “propagate” everything else as branches.

As a consequence, users can do:

git branch -r
git tag
git checkout -b <branch> origin/<branch>
Add a comment »
comments are moderated