undvd, now in perl!

October 5th, 2008

So it turns out you can do a whole lot with bash. More than I knew. But when you get to a point where you start hitting the limitations of your language*, it gets frustrating. The biggest problem with bash is that it doesn't have functions. You can wrap a bunch of code and call it with arguments, but it doesn't return a value. I've tried to come up with a hack to emulate functions returning arguments, but in the end there just aren't enough pieces in the box to build it from.

To date, undvd has been using various tricks to get around this. Let the function echo the value and capture this in the caller. But then what if you have a failing condition? Well, you can echo the value to stdout and echo the error to stderr, so it doesn't get captured as the result of the function. And then kill $$ to force an exit (you can't just exit cause that is equivalent to a return from the function).

That kind of works, but eventually you break down when you have to return more than one string that may contain whitespace. Sure, you could quote them, let the caller find both strings based on where the quotes are, then chop off the quotes and voila. But all this just for a function call? It's too much, and it's unacceptable from a maintenance standpoint.

Bash's overall weak support for other features of a typical programming language makes it a challenge to write structured programs. undvd-0.6 is therefore pretty much a dead end from a development standpoint. It works well enough, but it's hard to get anything more out of bash. In order to keep evolving, undvd needs a new language.

Another substantial problem with bash is that you're executing commands in the shell, in other words you build execution strings. There is a lot of potential for quoting bugs when you're dealing with filenames that have spaces and quotes in them. And not just when feeding them as arguments to executables, but on every "function call" just the same. I've spent a lot of time trying to safeguard against this, but all it takes is one instance where the strings aren't quoted correctly, and you have a fatal parsing error.

So it's time to think about porting to another language. A language that is close to the shell. A language that lets you run a subprocess by passing in the arguments as a list, not a string. A language that has basic programming constructs, like functions. That has good string handling. That can do simple floating point arithmetic. That is as widely available as possible. A language like... perl.

It sounds absurd, doesn't it? Porting to perl in the name of maintainability. But when you're in bash and most of what you're doing is string manipulation and calls to other executables, it's the right choice. And I bet you have perl on your box.

Not that it hasn't been fun. Bash was the right place to start, and I've learned a lot of things about bash on the way. I've also learned that you have to do obscene things like echo strings to bc to do simple floating point math.

The port

It's a straight port. undvd 0.7 runs on perl, but the way it was written was to reproduce 0.6 exactly. The code is completely new, obviously, but the functionality is the same.

As a result of running in perl, all the string/numerical processing logic has been internalized, and all the calls to awk, sed, bc and so on are gone. This makes it run faster, scandvd is especially noticeable. This isn't a big impact, since most of the work is done in mencoder, and that is still the same. Nevertheless, it's a welcome side effect. It also makes me happier, since it's less dependent on all these outside tools.

In terms of size the code is about the same. The perl code is actually 5% bigger.

What this means for you

  • 0.6 and 0.7 are functionally equivalent.
  • If you find a bug in 0.6, it's probably also in 0.7.
  • If you find a bug in 0.7, try 0.6.
  • Please report bugs.

* I'm using the term "language" loosely here. I'm talking about both the language, and the implementation, and the execution environment (ie. standard libraries, or in bash's case the gnu userland). Often we just pile all of this under "language", because it's easier to talk about it that way.

showing equations is not teaching

October 3rd, 2008

I'm going to describe something that you know very well, and that you do all the time. I'll describe it algebraically, so that we can keep it somewhat rigorous, like good teaching prescribes. Once I'm done, you'll know exactly what I'm talking about.

  1. c <= C
  2. w = (0 <= c < V)
  3. 0 < d < v
  4. p = {p1, p2, ..., pn}
  5. t = px

Got it? It has a colloquial name: doing laundry. Here's the same thing in words.

  1. grab a subset of the clothes in the laundry basket/hamper
  2. contents of washing machine equal to said clothes, but greater than zero, lesser than washing machine's volume
  3. contents of detergent compartment greater than zero, lesser than its volume
  4. machine has a set of programs
  5. duration of wash determined by chosen program

Here's the thing. If you understand laundry, and you knew that's what the equations were supposed to describe, you could probably figure out what's what. At the very least, you could come up with your own set of equations, and they might be similar enough to infer the original meaning.

But what if you had never heard about laundry, and all you got were these equations. Could you figure it out? No. You're just not that clever.

Now put yourself in the shoes of someone who's teaching laundry. You know laundry inside out, you can derive the equations at will. Laundry is the most obvious and trivial subject as far as you're concerned. Students come to your class, today's topic is laundry. You spend a couple sentences describing laundry. You explain it in words that your students don't understand. Then you present the equations. Then you go to lunch feeling good about yourself, passing on the knowledge and all that.

As it happens, not all the students latched onto the theory of laundry. Some are turning up, asking dumb questions. What is wrong with these people? How can you fail to understand laundry? You'd have to be dense. Geez, the quality of our freshmen really is plummeting. There's no way my generation was so thick.

dynamic or lexical, that is the scope

October 2nd, 2008

Apologies for the awful pun on a 16th century action movie.

Do you know how in the movies, when someone has to testify they first pin his hand on a Bible and make him recite that I swear to the tell the truth, the whole truth, and nothing but the truth, so help me God litany? Presumably, the god they're talking about is the god in the book, that's why the book is there (I bet polytheists find this very helpful). I guess they think it's harder for people to lie after taking a pledge while handling a Bible. (Do we have any statistics on whether that works?)

Anyway, in a dynamic scope, there is witness called Python. He will make his pledge based on the book that they happened to shove under his hand that day. One day it could be the Bible, a week later it could be The Gospel of the Flying Spaghetti Monster. So that means the pledge will be somehow relative to the god in that particular book. Uneasy about one god, very comfortable with another one.

In a lexical scope, there is a witness called Perl. He is very emotional about his first time as a witness. And even though they give him a new book every time, he just can't seem to notice. He makes his pledge based on the very first book they slipped him.

And now for a short digression into the world of programming languages. You have two scopes, one is the innermost, the other is the outer scope. There is a variable in the inner scope, but it's bound in the outer scope. How do you evaluate this variable? There are two answers to this question.

Under dynamic scoping the variable gets the value that it has in the outer scope at the time of evalution. Under lexical scoping the variable gets the value that it has in the outer scope at the time of declaration.

That didn't explain anything, did it? I know, read on.

Who cares?

This is an important question, and people rarely seem to ask it. Functions care. Named functions and unnamed functions like lambdas, blocks, closures (different languages have different names for the same thing). Anything that has a scope and can be passed around, so that's only functions.

So all the blah about lexical scoping really just boils down to one little detail that has to do with how functions are evaluated. Hardly seems worth the effort, does it?

Dynamic, baby

Dynamic scoping is the more intuitive one. You grab the value of the variable that it has when you need to use it. That's what's dynamic about it, today this value, tomorrow another.

Consider this Python program. It prints the same string in three different colors. output is the function responsible for the actual printing. output has an inner helper function colorize, which performs the markup with shell escapes. Now, since colorize is defined in the scope of output, we can just reuse those bindings. I pass the string explicitly, but I don't bother passing the color index variable. (A variable gets interpolated where there is an %s).

def output(color_id, s):

    def colorize(s):
        return "\033[1;3%sm%s\033[0m" % (color_id, s)

    print colorize(s)


for e in range(1, 4):
    output(e, "sudo make me a sandwich")

Lexical ftw

Lexiwhat? If you recall from last time, "lexical" is a pretentious way of saying "where it was written".

What this implies is that the outer binding is evaluated only the first time. After that, whatever scope the function finds itself being evaluated in, it doesn't matter, the variable with an outer binding doesn't change value.

Consider this Perl code which is exactly the same as before.

sub output {
    my ($color_id, $s) = @_;

    sub colorize {
        my ($s) = @_;
        return "\033[1;3${color_id}m${s}\033[0m";
    }

    print colorize("$s\n");
}


for (my $e = 1; $e < 4; $e++) {
    output($e, "sudo make me a sandwich");
}

How do you think it evaluates?

Oh no, it's broken! Why? Because the first time colorize is evaluated, the value of ${color_id} is recorded and stored for all eternity. The term lexical in this example isn't helpful at all, because the function is *always* evaluated where it was declared, it's not passed to some other place where the value of ${color_id} could have been decided by someone other than output. 'pedia says lexical scoping is also called static scoping, which makes more sense here.

Interestingly, in the language of tweaks that is Perl you can replace my with local on line 2 and you got yourself dynamic scoping! :-o The code will run as expected now.

Which is better?

I don't know. I don't have any conclusions yet. I got into the habit of writing inner functions in Python without passing in all the arguments, it's useful sometimes when you have a lot of variables in scope. And then I got in trouble for doing the same thing in Perl.

In languages without assignment, they will obviously pick lexical, because it reinforces the rule of referential transparency. A variable assigned always keeps the same value.

You need lexical scoping to have closures. A function being defined has to be able to capture and store the values of its unbound variables. Otherwise you could pass it to some other scope that doesn't have bindings for variables with those names, and then what?

But you know what? Python has closures anyway. Here, colorize is defined inside output, but it's not called. It passes back to the loop, and it's called there. But that scope doesn't have a binding for color_id! And yet it still works.

def output(color_id):

    def colorize(s):
        return "\033[1;3%sm%s\033[0m" % (color_id, s)

    return colorize


for e in range(1, 4):
    f = output(e)
    print f("sudo make me a sandwich")

If you try the same thing with Perl and local in place of my, and set $color_id to $e, it works too.

So at least for Python and Perl, you can't reasonably say "dynamic scoping" or "lexical scoping". They do a bit of both. So why is that? Are the concepts dynamic and lexical just too simplistic to use in the "real world"?

git by example - upgrade wordpress like a ninja

September 21st, 2008

I addressed the issue of wordpress upgrades once before. That was a hacky home grown solution. For a while now I've been using git instead, which is the organized way of doing it. This method is not specific to wordpress, it works with any piece of code where you want to keep current with updates, and yet you have some local modifications of your own.

To recap the problem shortly.. you installed wordpress on your server. Then you made some changes to the code, maybe you changed the fonts in the theme, for instance. (In practice, you will have a lot more modifications if you've installed any plugins or uploaded files.) And now the wordpress people are saying there is an upgrade available, so you want to upgrade, but you want to keep your changes.

If you are handling this manually, you now have to track down all the changes you made, do the upgrade, and then go over the list and see if they all still apply, and if so re-apply them. git just says: you're using a computer, you git, I'll do it for you. In fact, with git you can keep track of what changes you have made and have access to them at any time. And that's exactly what you want.

1. Starting up (the first time)

The first thing you should find out is which version of wordpress you're running. In this demo I'm running 2.6. So what I'm going to do is create a git repository and start with the wordpress-2.6 codebase.

# download and extract the currently installed version
wget http://wordpress.org/wordpress-2.6.tar.gz
tar xzvf wordpress-2.6.tar.gz
cd wordpress

# initiate git repository
git-init

# add all the wordpress files
git-add .

# check status of repository
git-status

# commit these files
git-commit -m'check in initial 2.6.0 upstream'

# see a graphical picture of your repository
gitk --all

This is the typical way of initializing a repository, you run an init command to get an empty repo (you'll notice a .git/ directory was created). Then you add some files and check the status. git will tell you that you've added lots of files, which is correct. So you make a commit. Now you have one commit in the repo. You'll want to use the gui program gitk to visualize the repo, I think you'll find it's extremely useful. This is what your repo looks like now:

gitk is saying that you have one commit, it's showing the commit message, and it's telling you that you're on the master branch. This may seem odd seeing as how we didn't create any branches, but master is the standard branch that every repository gets on init.

The plan is to keep the upstream wordpress code separate from your local changes, so you'll only be using master to add new wordpress releases. For your own stuff, let's create a new branch called mine (the names of branches don't mean anything to git, you can call them anything you want).

# create a branch where I'll keep my own changes
git-branch mine

# switch to mine branch
git-checkout mine

# see how the repository has changed
gitk --all

When we now look at gitk the repository hasn't changed dramatically (after all we haven't made any new commits). But we now see that the single commit belongs to both branches master and mine. What's more, mine is displayed in boldface, which means this is the branch we are on right now.

What this means is that we have two brances, but they currently have the exact same history.

2. Making changes (on every edit)

So now we have the repository all set up and we're ready to make some edits to the code. Make sure you do this on the mine branch.

If you're already running wordpress-2.6 with local modifications, now is the time to import your modified codebase. Just copy your wordpress/ directory to the same location. This will obviously overwrite all the original files with yours, and it will add all the files that you have added (plugins, uploads etc). Don't worry though, this is perfectly safe. git will figure out what's what.

Importing your codebase into git only needs to be done the first time, after that you'll just be making edits to the code.

# switch to mine branch
git-checkout mine

# copy my own tree into the git repository mine branch
#cp -ar mine/wordpress .. 

# make changes to the code
#vim wp-content/themes/default/style.css

# check status of repository
git-status

When you check the status you'll see that git has figured out which files have changed between the original wordpress version and your local one. git also shows the files that are in your version, but not in the original wordpress distribution as "untracked files", ie. files that are lying around that you haven't yet asked git to keep track of.

So let's add these files and from now on every time something happens to them, git will tell you. And then commit these changes. You actually want to write a commit message that describes exactly the changes you made. That way, later on you can look at the repo history and see these messages and they will tell you something useful.

# add all new files and changed files
git-add .

# check in my changes on mine branch
git-commit -m'check in my mods'

# see how the repository has changed
gitk --all

When you look at the repo history with gitk, you'll see a change. There is a new commit on the mine branch. Furthermore, mine and master no longer coincide. mine originates from (is based on) master, because the two dots are connected with a line.

What's interesting here is that this commit history is exactly what we wanted. If we go back to master, we have the upstream version of wordpress untouched. Then we move to mine, and we get our local changes applied to upstream. Every time we make a change and commit, we'll add another commit to mine, stacking all of these changes on top of master.

You can also use git-log master..mine to see the commit history, and git-diff master..mine to see the actual file edits between those two branches.

3. Upgrading wordpress (on every upgrade)

Now suppose you want to upgrade to wordpress-2.6.2. You have two branches, mine for local changes, and master for upstream releases. So let's change to master and extract the files from upstream. Again you're overwriting the tree, but by now you know that git will sort it out. ;)

# switch to the master branch
git-checkout master

# download and extract new wordpress version
cd ..
wget http://wordpress.org/wordpress-2.6.2.tar.gz
tar xzvf wordpress-2.6.2.tar.gz
cd wordpress

# check status
git-status

Checking the status at this point is fairly important, because git has now figured out exactly what has changed in wordpress between 2.6 and 2.6.2, and here you get to see it. You should probably look through this list quite carefully and think about how it affects your local modifications. If a file is marked as changed and you want to see the actual changes you can use git-diff <filename>.

Now you add the changes and make a new commit on the master branch.

# add all new files and changed files
git-add .

# commit new version
git-commit -m'check in 2.6.2 upstream'

# see how the repository has changed
gitk --all

When you now look at the repo history there's been an interesting development. As expected, the master branch has moved on one commit, but since this is a different commit than the one mine has, the branches have diverged. They have a common history, to be sure, but they are no longer on the same path.

Here you've hit the classical problem of a user who wants to modify code for his own needs. The code is moving in two different directions, one is upstream, the other is your own.

Now cheer up, git knows how to deal with this situation. It's called "rebasing". First we switch back to the mine branch. And now we use git-rebase, which takes all the commits in mine and stacks them on top of master again (ie. we base our commits on master).

# check out mine branch
git-checkout mine

# stack my changes on top of master branch
git-rebase master

# see how the repository has changed
gitk --all

Keep in mind that rebasing can fail. Suppose you made a change on line 4, and the wordpress upgrade also made a change on line 4. How is git supposed to know which of these to use? In such a case you'll get a "conflict". This means you have to edit the file yourself (git will show you where in the file the conflict is) and decide which change to apply. Once you've done that, git-add the file and then git-rebase --continue to keep going with the rebase.

Although conflicts happen, they are rare. All of your changes that don't affect the changes in the upgrade will be applied automatically to wordpress-2.6.2, as if you were doing it yourself. You'll only hit a conflict in a case where if you were doing this manually it would not be obvious how to apply your modification.

Once you're done rebasing, your history will look like this. As you can see, all is well again, we've returned to the state that we had at the end of section 2. Once again, your changes are based on upstream. This is what a successful upgrade looks like, and you didn't have to do it manually. :cap:

Tips

Don't be afraid to screw up

You will, lots of times. The way that git works, every working directory is a full copy of the repository. So if you're worried that you might screw up something, just make a copy of it before you start (you can do this at any stage in the process), and then you can revert to that if something goes wrong. git itself has a lot of ways to undo mistakes, and once you learn more about it you'll start using those methods instead.

Upgrade offline

If you are using git to upgrade wordpress on your web server, make a copy of the repo before you start, then do the upgrade on that copy. When you're done, replace the live directory with the upgraded one. You don't want your users to access the directory while you're doing the upgrade, both because it will look broken to them, and because errors can occur if you try to write to the database in this inconsistent state.

Keep your commits small and topical

You will probably be spending most of your time in stage 2 - making edits. It's good practice to make a new commit for every topical change you make. So if your goal is to "make all links blue" then you should make all the changes related to that goal, and then commit. By working this way, you can review your repo history and be able to see what you tried to accomplish and what you changed on each little goal.

Revision control is about working habits

You've only seen a small, albeit useful, slice of git in this tutorial. git is a big and complicated program, but as with many other things, it already pays off if you know a little about it, it allows you to be more efficient. So don't worry about not knowing the rest, it will come one step at a time. And above all, git is all about the way you work, which means you won't completely change your working habits overnight, it will have to be gradual.

This tutorial alone should show you that it's entirely possible to keep local changes and still upgrade frequently without a lot of effort or risk. I used to dread upgrades, thinking it would be a lot of work and my code would break. I don't anymore.

Dear Nokia

September 20th, 2008

I'm confused.

You're making these internet tablets with a keyboard, built-in wlan and bluetooth. It looks like a pretty complete mini-desktop device. The KDE people are really excited about running KDE on it, that's wonderful.

There's just one big question mark here. Why do I need a little computer that gives me internet access? I don't know about you, but where I live there are computers anywhere I turn, at home, at school, at work. And if I really needed a smaller one I would get the Acer Aspire One, which is much more powerful and useful than your tablets (and it's the same price range!).

Because, you see, if I'm not at home or school or work, I don't have an internet connection. So your "portable internet device" just becomes a portable without connectivity. No different from my laptop.

I wonder... is there anything that would make this "portable" more useful? Perhaps some kind of universal communications network that doesn't require a nearby wireless access point? Like say, the phone network? I hear you're flirting with the idea of building phones, yes?

So why not build the phone into the "internet tablet"? That would actually give it something my laptop doesn't have, it'd give me a reason to buy it. I mean you've already put everything else a modern phone has on the tablet, how hard could it be to add a phone?

I'll tell you what, I'm in the market for one at the moment. I've never bought a Nokia product in my life, so this is your big chance. Do we have a deal?