re: for the man with many repos

November 13th, 2011

As it often goes, re is a tool that grew out of a bunch of shell scripts. I kept adding stuff to the scripts for a long time, but eventually it went beyond the point of being manageable.

The tool addresses three different issues:

  • Cloning/pulling multiple repos in one step.
  • Keeping repo clones in sync across machines.
  • Better handling of local tracking branches.

Listing repos

Let's start with a basic situation. I've cloned some of my repos on github:

$ ls -F
galleryforge/  italian-course/  re/  spiderfetch/

I run re list to scan the current path recursively and discover all the repos that exist:

$ re list                                                                                
[galleryforge:git]
    origin.url = git@github.com:numerodix/galleryforge.git
[italian-course:git]
    origin.url = git@github.com:numerodix/italian-course.git
[re:git]
    origin.url = git@github.com:numerodix/re.git
[spiderfetch:git]
    origin.url = git@github.com:numerodix/spiderfetch.git
> Run with -u to update .reconfig

It creates a configuration file called .reconfig that contains the output you see there. By default it doesn't overwrite the config, just shows you the result of the detection action. Pass -u to update it.

This file format is similar to .git/config. Every block is a repo, and :git is a tag saying "this is a git repo". (By design re is vcs agnostic, but in practice I only ever use git and the only backend right now is for git. It probably smells a lot of git in any case.)

Every line inside a block represents a remote (git terminology). By default there is only one. If you add add a remote in the repo and re-run re list it will detect it. But it will assume that origin is the canonical remote (more on why this matters later).

Pulling repos

Now let's say I want to pull all those repos to sync them with github. I use (you guessed it) re pull:

$ re pull                                                                                
> Fetching galleryforge
> Fetching italian-course                                                                
> Fetching re                                                                            
> Fetching spiderfetch                                                                   
> Merging galleryforge                                                                   
> Merging italian-course                                                                 
> Merging re                                                                             
> Merging spiderfetch                                                                    
-> Setting up local tracking branch ruby-legacy                                          
-> Setting up local tracking branch sqlite-try                                           
-> Setting up local tracking branch db-subclass                                          
-> Setting up local tracking branch next

As you can see it does fetching and merging in separate steps. Fetching is where all the network traffic happens, merging is local, which is why I think it's nice to separate them. (But there are more reasons to avoid git pull.)

What it also does is set up local tracking branches against the canonical remote. The canonical remote is the one listed first in .reconfig. So it doesn't matter what it's called, but it's a good idea to make it origin, because that's what re list will assume when you use it to update .reconfig after you add/remove repos.

It handles local tracking branches only against one remote, because if both origin and sourceforge have a branch called stable then it's not clear which one of those the local branch stable is supposed to track. I find this convention quite handy, but your mileage may vary.

If I later remove the branch ruby-legacy from github and run re pull, it's going to detect that I have a local tracking branch that is pointing at something that doesn't exist anymore:

$ re pull spiderfetch
> Fetching spiderfetch
> Merging spiderfetch                                                                    
-> Stale local tracking branch ruby-legacy, remove? [yN]

Scaling beyond a single machine

Now, re helps you manage multiple repos, but it also helps you keep your repos synced across machines. .reconfig is a kind of spec for what you want your repo-hosting directory to contain, so you can just ship it to a different machine, re pull and it will clone all the repos over there, set up local tracking branches, all the same stuff.

In fact, why not keep .reconfig itself in a repo, which again you can push to a central location and from which you can pull onto all your machines:

$ re list                                                                                
[.:git]
    origin.url = user@host:~/repohost.git
[galleryforge:git]
    origin.url = git@github.com:numerodix/galleryforge.git
[italian-course:git]
    origin.url = git@github.com:numerodix/italian-course.git
[re:git]
    origin.url = git@github.com:numerodix/re.git
[spiderfetch:git]
    origin.url = git@github.com:numerodix/spiderfetch.git
> Run with -u to update .reconfig

It does not manage .gitignore, so you have to do that yourself.

Advanced uses

Those are the basics of re, but the thing to realize is that it doesn't limit you to a situation like the one we've seen in the examples so far, with a single directory that contains repos. You can have repos at any level of depth, you can have .reconfigs at different levels too, and you can then use a single re pull -r to recursively pull absolutely everything in one step.

Get it from github:

:: random entries in this category ::

7 Responses to "re: for the man with many repos"

  1. shell says:

    Looks really nice. But running re list gives the following errors:

    Traceback (most recent call last):
    File "/usr/bin/re", line 132, in
    program.invoke(bundle, recurse=options.recurse)
    File "/usr/bin/re", line 41, in invoke
    cmd(*args, **kwargs)
    File "/usr/bin/re", line 46, in cmd_list
    repo_manager = RepoManager()
    File "/home/user/git/re/model/__init__.py", line 24, in __init__
    self.repos = collections.OrderedDict()
    AttributeError: 'module' object has no attribute 'OrderedDict'

  2. numerodix says:

    I believe OrderedDict was added in python 2.7, so I suspect you are using an earlier version.

  3. Dieter_be says:

    How does this compare to the (more known?) mr tool?
    http://kitenet.net/~joey/code/mr/

  4. numerodix says:

    Good question, I've never heard of that before.

  5. shell says:

    No, i'm using dev-lang/python 2.7.1-r3

  6. numerodix says:

    That's very odd, why would it not be found?
    http://docs.python.org/library/collections.html#collections.OrderedDict

  7. numerodix says:

    I've added a fallback for this now, try it out.