undvd now in portage

December 17th, 2007

I was pleasantly surprised today to discover that undvd has made its way into portage. :cool: It does now have three releases on sourceforge with packages for gentoo (ebuild), ubuntu and fedora. But I haven't made any requests for it to be included, so apparently the powers that be decided it's not completely useless. :party:

It's actually a bit of a dilemma what to do with these small projects. A big project just looks neglected without a website, complete documentation, packages for several distributions, up-to-date entries on various directory sites etc. I did the full scale follow-up with galleryforge, and that now speaks for itself, it presents itself as a proper project. But for one thing it was my first time hosting with one of these open source providers, so I wanted to explore the options quite deeply. And secondly, you don't really know what kind of a future a project will have when you're starting it, but galleryforge isn't used much, not even by me. So it was fine to do then, but it's some effort to put in for every small piece of code like this. (Not so much doing it then, but the burden of maintenance is the key issue.)

undvd is even smaller, and isn't expected to grow much (beyond feature complete by now). It started out as a blog entry and a link on my code page. Then I posted it on opendesktop, and it got seen by a handful of people who happened to be looking at the frontpage when it was added. There was also a regression bug posted there, which spurred me to start tracking it with git to avoid these things in the future. Then I decided I wanted to package it up, because that makes it organized and easy to use. But then I couldn't figure out how to write the url (in the ebuild) for the tarball hosted on opendesktop, because their file hosting is a little sucky. So I registered on sourceforge. The only thing I'm using there right now is the file release function. There's no website, and I don't really see the call for one. Meanwhile, I've been trying to maintain the user guide (which is shipped with the code) clear and up-to-date, to make that the one source of information. I think that's enough for now.

GPL vs BSD, a matter of sustainability

December 15th, 2007

If you haven't been living under a rock the past decade (I suppose Stonehenge qualifies) you may have walked in on some incarnation of the famous GPL vs BSD flamewar. It's up there with the most famous flamewars (now *there's* a research question for a brimming sociology student!) of our beloved Internet society.

Both licensing models have been around for a very long time. I don't know which predates which, but it really doesn't matter. The spirit behind both licenses is very similar: free software is good. But they realize this idea in different ways.

In the GPL license you have the four freedoms: to run the software, to have the source code, to distribute the software, to distribute your modifications to the software. What this implies is that when you obtain the software, you have the *obligation* to ensure that these four things hold true for the next person you give it to. After all, someone had to go to the trouble of preserving these rights for *you*, so you have to do the same for the next guy.

The BSD license is different, because it gives *you* the right to distribute the software, but it does not oblige you to make sure that the next guy has any such right. Well, that's not really a problem, the next guy can ignore you and get the software from the same source that you did (if that source is still available). But if you change it and you give it to him, you can forbid him from passing it on.

So who is right? Well, the BSD camp is. The BSD is no doubt a freer license, it gives you the right to decide what rights to bundle with the software. That is much closer to the absolute meaning of "freedom" than the GPL. Alas, it's not "completely" free, because you can't remove the name of the software's author and replace it with "Leonardo da Vinci".

What the GPL terms "freedom" is actually fairly subversive, because it *forces* you to do certain things. Most people who are forced to do something call that a "restriction" rather than a "freedom". It's true that you have certain freedoms when you get the software, but if you want to pass it on you have restrictions, so they could just as well call it the four freedoms and the four restrictions.

Therefore, if we take the philosophical ideal of freedom to heart, even though both of these licenses promote free software, none of them represent freedom, and the GPL is far less free than the BSD.

Harmless restrictions

Suppose you're a parent and you give your kid a candy bar and say this is for you and your brother, you can have half of it, and when he comes home give him the other half. Do you think that is going to happen just as you instructed? How confident are you?

Well, your intentions were good. You tried to ensure fairness. But we humans are scheming devils, aren't we? So our philosophy is a bit of an idealization, we just don't live up to it.

Is there some way we can find a measure of freedom that is good enough? The fact is that we live with a lot of implicit restrictions without worrying too much about them. If you tell your kid you're free to wear anything you want, eat anything you want, be anywhere you want, and do anything you want, except you can't burn the house down most kids would find that a very satisfying degree of freedom, despite the restriction. They would probably say well I wasn't going to do that anyway, all my toys would go up in smoke.

So what can we do about sustainability?

Freedom in its pure form is a wonderful thing, but it's not inherently sustainable. You can take something and compare it up against freedom and tell if it's free, but you can't use freedom to enforce freedom. That would be absurd.

The GPL model is sustainable. It offers freedom, but with the pragmatic twist that there needs to be some kind of force to keep the freedom in place. In that sense it could even be said to be more free, because the *accumulated* freedom over all people involved is higher than when one person has all the freedom and everyone else has none.

GPL freedom is isomorphic. If OpenOffice needs a way to open jpeg files, and the gimp already has code for this, OpenOffice can just take it. Then two years later if OpenOffice reads jpegs much faster, the gimp can take the modified code from OpenOffice and use it. Both parties have the same degree of freedom, and no freedom is lost along the way, the process is "lossless".

BSD freedom, on the other hand, is "lossy". If I get BSD code I have a lot of freedom, but the next guy doesn't. It's fairly well known that there is BSD code in Windows. And obviously, whatever Microsoft did with that code, they have no obligation to release their changes. So the code *was* free at one point, but it didn't *remain* free. Furthermore, even if they didn't change it one bit, if the original author is no longer around, Microsoft is still sitting on BSD code that is free for *them*, but it's no longer free for anyone else.

So what can we conclude from all this? Both license models make software free, but only GPL software is sustainably free. The BSD gives greater freedom, the GPL gives more freedom. Choose which one you value more.

For a more in-depth discussion see this essay, not only for itself, but also the many many references it contains to other relevant texts.

UPDATE: Alexandre Baron has written a French translation.

undvd: looking ahead. and back.

December 11th, 2007

When you decide to code an application in a problem space that already has a rich bouquet of solutions the typical questions come to mind immediately. What is this going to accomplish? Why not just use an existing application? Why reinvent the wheel? Perfectly valid concerns, and yet every application has some reason to be, or it wouldn't exist in the first place. Sometimes it's a question of timing - better solutions did not exist at inception. Sometimes it's about quality: that one over there works but doesn't do it well enough. And sometimes it's something different altogether. That one does too much and I don't understand it.

There are people out there who are proper movie buffs. People who can quote lines out of a movie and identify passages to extreme detail. I've known a bunch of people like that, but I've never been like that myself. To me watching movies was never a really important thing, it was kind of a second tier interest. But knowing people like this has its advantages, you can borrow their movies. :cool: I remember how we used to lug around VHS tapes at school.

So my lack of outright passion about movies is really the reason why I never took after the dvd ripping crowd. When DVDs were fairly new, people were so excited about divx. My friends at school used to discuss bitrates and filesizes. I didn't follow any of that, I wasn't really interested. I didn't have a DVD drive, nor did I think paying 38 for a movie was such a good deal.

But admittedly, ripping the occasional movie is useful. DVDs in general are just a hassle and I don't like to put up with them. Back in those days (a few years later, when I had bought my first dvd drive for my first custom built computer) I would try it a few times. I didn't mirror the excitement of my friends. It took a really long time to rip a movie, and the result wasn't that good. I don't know what I was doing wrong. We used the Gordian Knot suite and followed the guides on Doom9. But while the program gave me every possibility to tweak the settings, I didn't know what they mean, I didn't *want* to know what they mean, I just wanted to rip it and move on.

And that is precisely the point where undvd comes in. I've tried a bunch of tools for ripping DVDs. dvd::rip might be the most merited linux tool. There's also acidrip and a bunch of others I don't even remember anymore. All these tools had the same two flaws:

  • They present a lot of options that I don't know anything about, and I don't care about. And presumably, since my results were never very good, these are crucial for good results.
  • It takes anything from 15 to 45 minutes to actually set up all the options, inputs, outputs, filters, cropping, the whole shebang. When something fails, like the disc can't be read from, the tool sometimes crashes. If it does, all your set-up work is lost. But even if you have a working setup from before and you want to rip another movie you still have to double check all these settings, because ripping takes a long time and if you mis set something by accident you may only discover it far along the way.

Therefore, I think it's fair to say that we have the Photoshop of DVD ripping in place. Personally I want to use Paint for this activity, because I have no deep interest in all the details.

Making it work

So if there is something complicated you have to do again and again, and every time you worry that you'll get it wrong and even if you get it right next time you probably don't remember what you did again. Faced with that, what do you do? In the middle ages monks would copy books by hand, at extraordinary effort. Not only would they transcribe the text, they would also copy and enhance the illustrations so that the copy, while not being the original, would be no less worthy. This was their calling, transcribing was an art form. Then came Gutenberg. Now granted, building a printing press took a lot more effort than transcribing a book. But once you had done that, copying books was a no brainer.

To me ripping a DVD is not an art form, it's a chore. And if I can figure out how to do it once and never have to worry about it again, then it's well worth the effort. So I set out to figure it out, I looked up the details and read the fine print. Eventually I had a fairly good idea of how to do it, and the results were coming along.

undvd became the wrapper around that nastiness I didn't want to look at or remember. And for my money it worked well. It also encompasses certain design decisions that were important to me.

Why no gui?

The Photoshop rippers all come with a gui interface. That's hardly a surprise, the plethora of settings they allow you to fiddle with would look like mplayer's list of options if they weren't gui apps. It's practically impossible *not* to have a gui for those guys.

But DVD ripping is not inherently a gui centric activity. You need the gui to write your missile launch plan, but once you start ripping the gui doesn't serve any purpose anymore. It just sits there in memory, and you could just as well remove it while ripping and then bring it back once it's over. So 95% of the time the gui isn't being used.

Apart from the simple fact that coding a gui is a lot more work, I consider a gui a hindrance in this case. There is nothing a gui would do for undvd that would improve how the program is being used.

Why not interactive?

There are some non-gui rippers/encoders out there, like h264enc and ripdvd (and probably a bunch of others). Common to them is that they run interactively. They ask you a series of questions to determine everything they need to know about ripping the DVD before they can start doing the work.

Unfortunately, interactive mode suffers from the same problems that a gui interface does. You have to answer all the questions whether you know what to answer or you don't. And more importantly, it's hard to reproduce the result, because you don't remember what you answered the last time.

And crucially, an interactive program won't let you just run it and go do something else, you have to stick around to do your part before the app will do its part. With a ripper this isn't such a big deal, because all the interaction happens at the beginning, but it's still something I like to do without.

Why bash?

The standard way of building a ripper is to wrap an interface around the scary mencoder command line interface. Whether this is a gui or not has no impact on how the ripper interacts with mencoder. There is no programmatic interface to mencoder, so you're stuck running it from a shell one way or the other.

Taking this into account, a bash script is pretty much the easiest way to handle shell commands. (If you're a perl nut, perhaps that would suit you better, but I'm not.) I've tried running commands from python (and using it to capture the exit code, stdout, stderr etc), and it's far easier just to use plain bash.

As a self imposed restriction, it will also keep the program from becoming too big. Bash is not exactly the greatest abstraction language and takes quite a bit of discipline to keep it from getting totally convoluted, which is why I would not use it for anything bigger than a simple application.

Feature creep

Unfortunately, every program that evolves will face the issue of new features. Personally I think undvd does what it does well enough, I'm satisfied. But any one-size-fits-all strategy is bound to be somewhat constrained. The question is where to stop?

Two things came up that weren't intended for undvd:

Scaling

To start off with the first issue, undvd makes the assumption that the output video will be smaller than the original. This is fairly sensible, given that most rips are done this way, and considering that 6x compression does demand a lot from your encoder unless you compromise a bit on dimensions. Crucially, even if you watch a movie full screen the software/hardware scaling does a good enough job of stretching the image without clobbering it. Having said that, undvd's decision to scale to 2/3 of the original size is arbitrary and I accept that it's not a well justified constraint.

So scaling will be added.

Bitrate and 2-pass encoding

The bitrate issue, on the other hand, is hairy. Very hairy. I kept thinking about a meaningful way to set the bitrate. The obvious thing is to make it an integer parameter, like say 1000kbps. But what is that? It's just a number, a meaningless number. The "right" bitrate differs from movie to movie, from codec to codec, perhaps even from encoder to encoder. We are back in Photoshop land.

So I follow the convention of setting a file output size instead. If you say you want this title ripped to a file of 700mb, the bitrate is calculated automatically. This method is flawed, however, because the size we can accommodate for the video depends on how much space the audio requires, and there's no way to know this exactly. (Currently the mean audio bitrate is presumed 160kbps.) So the output size tends to come within 10% of the user-specified size.

The thing is, if you start messing with the bitrate you should also consider 2-pass encoding, because you can mess up your video quite badly that way. undvd tries to do "the right thing" by picking 2-pass encoding if the calculated bitrate is below the default. But you can always override to 1-pass or 2-pass whatever the case is.

In any event, if you don't use these new options then undvd does exactly what it always has.

And now for some screenshots

what makes RoboCop such a great movie

December 9th, 2007

Some works are just.. the only ones of their kind. RoboCop is such a work. The first of the series dates back to 1987; the last, to 1993. Altogether the three movies form a body of work that stands out from the plethora of supposedly similar works, action movies, cheap-story-big-boxoffice productions. It is simply.. art.

On the face of it, RoboCop is a lame story about the decay of society as a pretext for violence. The world portrayed in the story is dystopian, there are raving bands of hoodlums, there is organized crime, social order is all about defeated. And it's a cold world, filled with cruelty, void of compassion, a world of desperation. And above all, a world where corporations have taken over.

In this world, the dead police officer Alex Murphy becomes the cyborg RoboCop. A robot with a half-human mind, preserving some of his memories and emotions. But programmed by the powerful Omni Consumer Products (OCP) corporation to obey the directives he has been given. Essentially a killing machine, but a somewhat restrained one. Imagine a slow thinking, emotionally numb human, in terms of decision making.

Irony and comedy

But appearances can be deceiving. Sure enough, there is enough gunfire to go around, but the writers saw no reason not to have a little fun with the story. Picture your average worker stuck in a dead end job who has to pretend he's taking it very seriously, but when the boss isn't standing over him, he finds ways to amuse himself.

In order to drive home the point of a violent world in need of an enforcer like RoboCop, we have Media Break, the news program that reports nothing but violence. Since the writers already had this means of narration, they decided to put in some commercials as well, just like on real tv. These ads are wonderfully ironic and humorous. One is for a board game called Nukem, showing your average family sitting around a table playing wargames, where the culmination, of course, is the nuclear strike that wipes out everything.

Another ad goes like this:

*attractive woman by the pool in a robe* They say 20 seconds in the California sunshine is too much these days. Ever since we lost the Ozone layer. *takes off robe, now wearing a bikini* But that was before Sunblock 5000. Just apply a pint to your body *stars smearing herself with an opaque blue substance* and you're good for hours. See you by the pool. *by now completely covered in the blue stuff*

But there are other humorous angles to it. Take how RoboCop, who is by no means the brightest intelligence, suddenly develops deep sensitivity, so that when the little girl says she misses her parents, and he scans his data to learn that they were killed in the riots, he decides not to tell her. For a robotically enhanced killing machine he's also quite polite (compared to the humans around him), never curses people, the worst he ever called anyone is scum. In a rather unfortunate reprogramming mishap he even becomes politically correct, condemning wrong doing by lecturing delinquents rather than arresting them. Then, in a risky act of self-sacrifice he electrocutes himself to restore him own judgment.

Or take how in order to wrap up the story (whose premise it is that the OCP corporation controls the world) in some fairly definitive way, the writers decided that an unannounced, amateur broadcast of 2 minutes over the corporation's network would result in OCP's stock to drop to zero within 5 minutes. Problem solved. Then, just as we remember from The Karate Kid, Part II, the powerful owner of the Japanese company which absorbed the OCP in a hostile takeover came out to meet the people who had valiantly defended themselves against their corporate aggressors, decided he had made a mistake and paid tribute to them.

Truth

For all the parents out there on the fence about whether they should let their kid go see RoboCop, there is plenty of educational value in these movies.

Lesson #1: Crappy tv commercials never go away

That commercial with the bald guy and the hookers who says "I'll buy that for a dollar" appears several times in the first movie, and reappears in the last one.

Lesson #2: The success of the corporation is the suffering of humans

The big, evil corporation OCP is plotting to tear down all of Detroit to replace it with some sort of high tech metropolis called Delta City. But before that can happen, goes the story, crime must be brought under control. They even deploy their own special "security force" to speed up the process. And they own the police, so they can tell them what to do. The only incentive for any corporation is more profit, which means ever decreasing freedom for the average person, and ever increased in-fighting inside the corporation itself. Another movie that explains this is in great detail is The Corporation, so have that one ready when the kids start asking about it.

Lesson #3: Media is run by corporations
Media Break is driven by an agenda to be a scare mongering institution, a propaganda device. All they ever report is violence domestically and violence abroad. This is exactly like modern day media institutions which have lost all credibility and only pander to corporate interests. CNN, anyone? But at least Media Break has more of a conscience, as one of their reporters walks out in the middle of a newscast because she can't stand to dish out the misinformation.

Lesson #4: A system can only be secure when it's physically secure

Over the course of the three movies various people get their hands on RoboCop and reprogram him. In the third movie the little girl even plugs into the ED-209 robot using her laptop and makes it friendly to the rebels. This is the truth about computer security that everyone knows. If your system is physically compromised, you can't trust it. Sending a robot out into the world gives everyone access to it and they can plug into it just like you do at the lab.

evolving towards git

December 4th, 2007

I remember the first time I heard about version management. It was a bit like discovering that there is water. The idea that something so basic could actually not exist seemed altogether astounding afterwards. This was at a time when I hadn't done a lot of coding yet, so I hadn't suffered the pains of sharing code *that* much, but it was already relatable enough to understand the implications of version management. Emailing those zip files of code had been plenty annoying.

This happened just before we embarked upon a 6-month journey in college to write an inventory management application in java. There were four of us. We had done java before, but this was the first "big" application that also employed our newly acquired knowledge about databases. God only knows why, but they actually ran (and probably still do) an Oracle server at school for these projects. I don't know how much an academic license runs on that, but it's in the thousands of dollars no doubt. As the initiative taker of my group, I persuaded (not that it took much effort) the guys to forget about the Oracle server (which didn't come with any developer tools, just the awful command line interface) and use PostgreSQL (we rejected MySQL just to be on the safe side, as it lacked support for a bunch of things like views and foreign keys). I also introduced another deviation from the standard practice: CVS. I would be running both off of my tried and tested Pentium home server. :proud: We were the only group that used version management, which baffled me (why was it not offered as a service at the department?). I heard that the year after that they started offering CVS for projects. Incidentally, when you deviate off the path, you do expect to get some academic credit for the extra effort, obviously. ;)

So that was 6 months with CVS, and it worked quite well. We didn't do anything complicated with it, didn't know about tags or branches, just ran a linear development on it and that was it. And it was helpful (but not efficient) to track binary files (Word documents, bleh). But it had some annoying quirks. Not that we were all that concerned with tracking history back then, it was just about getting it finished and then obviously it would never be used, as school projects go. But the lack of renaming and moving in CVS was silly.

It's a bit funny, actually. CVS has been a standard for the longest time, and people have put up with its problems, until recently when a version management boom began. Why is that? I have a hunch that the relative calm in version management was kept intact by a predominantly corporate culture (and the corporates are obviously super conservative). But then once a certain number of people had gotten themselves into open source projects the need for better tools put this in motion. One of the first "new generation" systems was Subversion, the replacement for CVS. Subversion was adopted slowly, but quite steadily. Currently it's probably the "standard" for version management. I registered galleryforge with Berlios precisely because they offered svn services. Sourceforge also has it now, and people around the net have started talking in terms of svn a lot. My current school also uses it.

But with the momentum of version management systems in play, a lot of other ideas have surfaced besides those captured in Subversion. Arch and Darcs are two that employ the distributed paradigm. I don't think they have had much success though, and perhaps looking back they will have played the role of stepping stones rather than end products. A new (a third, if you will) generation of systems has appeared quite recently. Monotone is a newer system with its own network synchronization protocol. Mercurial seems to have enough support to replace Subversion/CVS down the line (not that it couldn't happen today, but people hate giving up something they're used to, so these things drag out). And there is Bazaar, which seems to discard the goal of being the best system and just aim to be easy and pleasant for small projects.

And somewhere in that mix we find Git, except that it has a much higher potential for success than other new systems, because it was launched into the biggest (or noisiest, at least) open source project and made to carry the heaviest loads right from the beginning, which is a good way to convince people that your product is robust. As such, it seems to me that git has had the quickest adoption of any system. It was launched in 2005 and it has already become one of the household names around the interwebs. So therefore I thought git is worth looking at. At this rate I may very well be using it on a project someday.

I was also encouraged by regular people saying it worked for them. A version management system for kernel developers is nice, but that doesn't necessarily make it right for everyone else. But here were people with one-man projects liking it, good sign. In fact, one of the things I used to read about git was "I like the simplicity".

What I discovered was that my dictionary was out of date, because it didn't carry that particular meaning of the word "simple". Git is not simple, it takes getting used to. It's one of those things you try and you don't understand, then you come back and try again and you make a little more progress and so on. I decided to open a git repo for undvd, both because I need to track the changes and so I can use git and get a feel for it.

The special thing about git is that it has well developed capabilities for interoperating with other systems. In other words, git doesn't assume it will wipe out everything else, it accepts the reality that some people will continue using whatever it is they're using (even CVS) and git can deal with that. Obviously, predicting the future is tricky, but I get the feeling that Subversion has hit that comfort spot where most people are satisfied with what it's doing and they really would need a lot of persuasion to learn something new. In other words, I expect Subversion to be around for quite a while. And therefore, git's svn tools may come in very handy. I haven't really looked at that part yet, but it's on my todo list.

So what is the future? I would say Subversion, Git, and maybe Mercurial. Version management systems are like programming languages - the more of them you use the easier it is to learn a new one. But switching is hard. When I'm thinking of checking something in version management I still begin typing svn st.. before I realize I'm using git.