Archive for the ‘technology’ Category

what is it about coding?

January 12th, 2008

This week marks the release of KDE 4 with a lot of noise. But what strikes me more than the actual code being released is all that I've read about kde4 ever since I started reading Planet KDE quite regularly last year. I've read the words of many people who are above all very excited about whatever it may be they are currently working on, big or small. There is a palpable widespread enthusiasm in that community (at least among the people who like to blog).

The question is why. What are the kde people so excited about? A qt widget that auto adjusts on resize? Uh-huh. Why is Brian Carper so determined to learn Lisp when it's not his job and no one is pushing him to it? Jarosław Rzeszótko is a Polish kid who tried to map out the entire realm of programming so that he can spend the next however many years learning... everything. Why does he care so much? The programming subreddit is consistently dominated by esoteric programming languages and wild ideas, which suggests that there's a lot of hobbyism and experimentation going on, not just plain "working for the man". Why bother with all this tinkering?

When we see a thrilling musical performance, there comes that realization of just how much work and how much mental energy it must have taken to produce it. If you start today and you're very talented, you might be able to give a thrilling performance in 10 years. People devote their lives to achieve this. But hey, it's music, it's art, it's incredible. Listening to music performed this way makes you feel something you can't reproduce any other way. It activates feelings deep inside you and brings them to the surface, making you experience hurt, relief, harmony, ... just by listening to sound.

So what about coding then? Why do people care about coding? It's not art. It doesn't make the user have all these wonderful experiences they get from music, it's really no more exciting for them than... filling in forms. There aren't any art exhibitions for software. And if ever using software channels deep feelings, it's usually not in a good way.

But there is something about programming that very deeply appeals to us programmers. It is difficult to explain, and I have never found it explained by anyone... fully. How do you explain something you don't completely understand?

Well, until today. I just received my order from Amazon, "The Mythical Man Month", Fred Brooks's set of essays on software engineering. In the first essay, The Tar Pit, Brooks writes:

The Joys of the Craft

Why is programming fun? What delights may its practitioner expect as his reward?

First is the sheer joy of making things. As the child delights in his mud pie, so the adult enjoys buildings things, especially things of his own design. I think this delight must be an image of God's delight in making things, a delight shown in the distinctness and newness of each leaf and each snowflake.

Second is the pleasure of making things that are useful to other people. Deep within, we want others to use our work and to find it helpful. In this respect the programming system is not essentially different from the child's first clay pencil holder "for Daddy's office".

Third is the fascination of fashioning complex puzzle-like objects of interlocking moving parts and watching them work in subtle cycles, playing out the consequences of principles built in from the beginning. The programmed computer has all the fascination of the pinball machine or the jukebox mechanism, carried to the ultimate.

Fourth is the joy of always learning, which springs from the nonrepeating nature of the task. In one way or another the problem is ever new, and its solver learns something: sometimes practical, sometimes theoretical, and sometimes both.

Finally, there is the delight of working in such a tractable medium. The programmer, like the poet, works only slightly removed from pure thought-stuff. He builds his castles in the air, from air, creating by exertion of the imagination. Few media of creation are so flexible, so easy to polish and rework, so readily capable of realizing grand conceptual structures. (As we shall see later, this very tractably has its own problems.)

Yet the program construct, unlike the poet's words, is real in the sense that it moves and works, producing visible outputs separate from the construct itself. It prints results, draws pictures, produces sounds, moves arms. The magic of myth and legend has come true in our time. One types the correct incantation on a keyboard, and a display screen comes to life, showing things that never were nor could be.

Programming then is fun because it gratifies creative longings built deep within us and delights sensibilities we have in common with all men.

In trying to formulate an answer to the question myself, I have only ever been able to clearly state the first of the mentioned facets: building things. The fluffiest and most intangible quality I see in the emphasized paragraph. In this respect I get people who don't understand programming. It is such a strange thing in many ways.

But it is also this that makes the endeavor forever interesting. Imagine a computer game that has such incredible depth that you can spend your whole life playing it and whichever dimension you pick you can never see the end of it. No matter how much you zoom in the picture, you can never see the pixels. No matter how refined your battle strategy is, you can never figure out the computer opponent, because as your strategy gains an ever greater granularity, so does his. And no matter how many levels you complete, there are always more left.

From a child psychology point of view, I may have been a surefire pick for programming (if someone were confident enough to foresee the PC revolution). I had building blocks from the beginning, and I loved Lego blocks more than anything else. I didn't really care about the structures they were for, I just built stuff. The only limitation to Lego blocks is actually the blocks themselves. There are only so many blocks, and there are only so many types of blocks. With rectangular blocks, you can't build something round. And with plastic blocks you can't build ships, cause the material is too heavy. And because the blocks are a certain size you can't build something that is both small and complex. And because the blocks are all the same material you can't do anything magical like make some parts lighter or run a current through a wire, because you need different materials for that.

I think of programming languages as blocks. They are what makes our programs "slightly removed from pure thought-stuff". They are what makes our abstract craft it into something real. But they also define and limit how we can build our castles in the air. Every programming language is a different straight jacket. A different set of blocks.

It is an interesting dilemma. We need languages to be able to express anything. A language, to be something, simply must be concrete. And with a concrete language we can build concrete things, but it is also the very same thing that limits what we can express. Without our language, our castles in the air have no restrictions. They are pure imagination. But they can also never be.

Apple: least creative company name ever?

December 21st, 2007

It's no secret that finding a good name for a company is no easy task. We're not exactly blown away by Microsoft, IBM, Compaq. They have no imagination.

There are some names that aren't too bad. Names that aren't too clear about what they mean can be decent, because they sorta make you wonder. Adobe sounds kind of exotic and interesting. Amazon has just the right amount of familiarity and positive association. FedEx is extremely obvious, but somehow snappy all the same, it has just the right amount of consonants. In contrast, UPS is worthless. Intel is also pretty good, you don't know where it comes from, but it's a positive sounding word. Oracle is a name with potential, but the company profile is just so overwhelmingly boring enterprise that they could just as well be called BigYawn. One of the best names is Nero. It has an interesting historical association, plus it's relevant to the function of the software.

The Web2.0 names are all pretty dumb. They don't have that boring cubicle feel, instead they sound like cartoon strips. And most of them mean nothing, but in such a way that you know they mean nothing, there's nothing to figure out. Which makes it dull. Like Google. Even if that meant something, would anyone care? No.

But if there's one company that never gave it more than 2 minutes of thought, it's the World Marketing Champion, Apple.

Apple — For the favourite fruit of co-founder Steve Jobs and/or for the time he worked at an apple orchard, and to distance itself from the cold, unapproachable, complicated imagery created by other computer companies at the time — which had names such as IBM, DEC, Cincom and Tesseract

What a useless name that is. It's dumb at face value, and it's not a pun that would keep your brain busy for 3 seconds. It's just an apple. Even Nail would have been a better name, cause then you don't know if it's the body part or the construction tool.

For a company whose primary concern is image you would think they'd decide to change it at some point. Or can't they come up with anything? Then again they're not exactly known to break the bank for product names either. Apple Computer, Macintosh, Mac, eMac, iMac, MacPro, PowerBook (wow, everyone, a new adjective), MacBook, MacBook Pro, Mac mini, iPod (the most creative one so far). Then there is the software line where again they have spared no expense. iChat, iPhoto, iMovie, iWeb, iDVD, iLife. Yes, all products for creative people, you can really feel the ideas percolating.

And the latest? The iPhone. Boy, that must have been tough to invent.

undvd now in portage

December 17th, 2007

I was pleasantly surprised today to discover that undvd has made its way into portage. :cool: It does now have three releases on sourceforge with packages for gentoo (ebuild), ubuntu and fedora. But I haven't made any requests for it to be included, so apparently the powers that be decided it's not completely useless. :party:

It's actually a bit of a dilemma what to do with these small projects. A big project just looks neglected without a website, complete documentation, packages for several distributions, up-to-date entries on various directory sites etc. I did the full scale follow-up with galleryforge, and that now speaks for itself, it presents itself as a proper project. But for one thing it was my first time hosting with one of these open source providers, so I wanted to explore the options quite deeply. And secondly, you don't really know what kind of a future a project will have when you're starting it, but galleryforge isn't used much, not even by me. So it was fine to do then, but it's some effort to put in for every small piece of code like this. (Not so much doing it then, but the burden of maintenance is the key issue.)

undvd is even smaller, and isn't expected to grow much (beyond feature complete by now). It started out as a blog entry and a link on my code page. Then I posted it on opendesktop, and it got seen by a handful of people who happened to be looking at the frontpage when it was added. There was also a regression bug posted there, which spurred me to start tracking it with git to avoid these things in the future. Then I decided I wanted to package it up, because that makes it organized and easy to use. But then I couldn't figure out how to write the url (in the ebuild) for the tarball hosted on opendesktop, because their file hosting is a little sucky. So I registered on sourceforge. The only thing I'm using there right now is the file release function. There's no website, and I don't really see the call for one. Meanwhile, I've been trying to maintain the user guide (which is shipped with the code) clear and up-to-date, to make that the one source of information. I think that's enough for now.

undvd: looking ahead. and back.

December 11th, 2007

When you decide to code an application in a problem space that already has a rich bouquet of solutions the typical questions come to mind immediately. What is this going to accomplish? Why not just use an existing application? Why reinvent the wheel? Perfectly valid concerns, and yet every application has some reason to be, or it wouldn't exist in the first place. Sometimes it's a question of timing - better solutions did not exist at inception. Sometimes it's about quality: that one over there works but doesn't do it well enough. And sometimes it's something different altogether. That one does too much and I don't understand it.

There are people out there who are proper movie buffs. People who can quote lines out of a movie and identify passages to extreme detail. I've known a bunch of people like that, but I've never been like that myself. To me watching movies was never a really important thing, it was kind of a second tier interest. But knowing people like this has its advantages, you can borrow their movies. :cool: I remember how we used to lug around VHS tapes at school.

So my lack of outright passion about movies is really the reason why I never took after the dvd ripping crowd. When DVDs were fairly new, people were so excited about divx. My friends at school used to discuss bitrates and filesizes. I didn't follow any of that, I wasn't really interested. I didn't have a DVD drive, nor did I think paying 38 for a movie was such a good deal.

But admittedly, ripping the occasional movie is useful. DVDs in general are just a hassle and I don't like to put up with them. Back in those days (a few years later, when I had bought my first dvd drive for my first custom built computer) I would try it a few times. I didn't mirror the excitement of my friends. It took a really long time to rip a movie, and the result wasn't that good. I don't know what I was doing wrong. We used the Gordian Knot suite and followed the guides on Doom9. But while the program gave me every possibility to tweak the settings, I didn't know what they mean, I didn't *want* to know what they mean, I just wanted to rip it and move on.

And that is precisely the point where undvd comes in. I've tried a bunch of tools for ripping DVDs. dvd::rip might be the most merited linux tool. There's also acidrip and a bunch of others I don't even remember anymore. All these tools had the same two flaws:

  • They present a lot of options that I don't know anything about, and I don't care about. And presumably, since my results were never very good, these are crucial for good results.
  • It takes anything from 15 to 45 minutes to actually set up all the options, inputs, outputs, filters, cropping, the whole shebang. When something fails, like the disc can't be read from, the tool sometimes crashes. If it does, all your set-up work is lost. But even if you have a working setup from before and you want to rip another movie you still have to double check all these settings, because ripping takes a long time and if you mis set something by accident you may only discover it far along the way.

Therefore, I think it's fair to say that we have the Photoshop of DVD ripping in place. Personally I want to use Paint for this activity, because I have no deep interest in all the details.

Making it work

So if there is something complicated you have to do again and again, and every time you worry that you'll get it wrong and even if you get it right next time you probably don't remember what you did again. Faced with that, what do you do? In the middle ages monks would copy books by hand, at extraordinary effort. Not only would they transcribe the text, they would also copy and enhance the illustrations so that the copy, while not being the original, would be no less worthy. This was their calling, transcribing was an art form. Then came Gutenberg. Now granted, building a printing press took a lot more effort than transcribing a book. But once you had done that, copying books was a no brainer.

To me ripping a DVD is not an art form, it's a chore. And if I can figure out how to do it once and never have to worry about it again, then it's well worth the effort. So I set out to figure it out, I looked up the details and read the fine print. Eventually I had a fairly good idea of how to do it, and the results were coming along.

undvd became the wrapper around that nastiness I didn't want to look at or remember. And for my money it worked well. It also encompasses certain design decisions that were important to me.

Why no gui?

The Photoshop rippers all come with a gui interface. That's hardly a surprise, the plethora of settings they allow you to fiddle with would look like mplayer's list of options if they weren't gui apps. It's practically impossible *not* to have a gui for those guys.

But DVD ripping is not inherently a gui centric activity. You need the gui to write your missile launch plan, but once you start ripping the gui doesn't serve any purpose anymore. It just sits there in memory, and you could just as well remove it while ripping and then bring it back once it's over. So 95% of the time the gui isn't being used.

Apart from the simple fact that coding a gui is a lot more work, I consider a gui a hindrance in this case. There is nothing a gui would do for undvd that would improve how the program is being used.

Why not interactive?

There are some non-gui rippers/encoders out there, like h264enc and ripdvd (and probably a bunch of others). Common to them is that they run interactively. They ask you a series of questions to determine everything they need to know about ripping the DVD before they can start doing the work.

Unfortunately, interactive mode suffers from the same problems that a gui interface does. You have to answer all the questions whether you know what to answer or you don't. And more importantly, it's hard to reproduce the result, because you don't remember what you answered the last time.

And crucially, an interactive program won't let you just run it and go do something else, you have to stick around to do your part before the app will do its part. With a ripper this isn't such a big deal, because all the interaction happens at the beginning, but it's still something I like to do without.

Why bash?

The standard way of building a ripper is to wrap an interface around the scary mencoder command line interface. Whether this is a gui or not has no impact on how the ripper interacts with mencoder. There is no programmatic interface to mencoder, so you're stuck running it from a shell one way or the other.

Taking this into account, a bash script is pretty much the easiest way to handle shell commands. (If you're a perl nut, perhaps that would suit you better, but I'm not.) I've tried running commands from python (and using it to capture the exit code, stdout, stderr etc), and it's far easier just to use plain bash.

As a self imposed restriction, it will also keep the program from becoming too big. Bash is not exactly the greatest abstraction language and takes quite a bit of discipline to keep it from getting totally convoluted, which is why I would not use it for anything bigger than a simple application.

Feature creep

Unfortunately, every program that evolves will face the issue of new features. Personally I think undvd does what it does well enough, I'm satisfied. But any one-size-fits-all strategy is bound to be somewhat constrained. The question is where to stop?

Two things came up that weren't intended for undvd:

Scaling

To start off with the first issue, undvd makes the assumption that the output video will be smaller than the original. This is fairly sensible, given that most rips are done this way, and considering that 6x compression does demand a lot from your encoder unless you compromise a bit on dimensions. Crucially, even if you watch a movie full screen the software/hardware scaling does a good enough job of stretching the image without clobbering it. Having said that, undvd's decision to scale to 2/3 of the original size is arbitrary and I accept that it's not a well justified constraint.

So scaling will be added.

Bitrate and 2-pass encoding

The bitrate issue, on the other hand, is hairy. Very hairy. I kept thinking about a meaningful way to set the bitrate. The obvious thing is to make it an integer parameter, like say 1000kbps. But what is that? It's just a number, a meaningless number. The "right" bitrate differs from movie to movie, from codec to codec, perhaps even from encoder to encoder. We are back in Photoshop land.

So I follow the convention of setting a file output size instead. If you say you want this title ripped to a file of 700mb, the bitrate is calculated automatically. This method is flawed, however, because the size we can accommodate for the video depends on how much space the audio requires, and there's no way to know this exactly. (Currently the mean audio bitrate is presumed 160kbps.) So the output size tends to come within 10% of the user-specified size.

The thing is, if you start messing with the bitrate you should also consider 2-pass encoding, because you can mess up your video quite badly that way. undvd tries to do "the right thing" by picking 2-pass encoding if the calculated bitrate is below the default. But you can always override to 1-pass or 2-pass whatever the case is.

In any event, if you don't use these new options then undvd does exactly what it always has.

And now for some screenshots

evolving towards git

December 4th, 2007

I remember the first time I heard about version management. It was a bit like discovering that there is water. The idea that something so basic could actually not exist seemed altogether astounding afterwards. This was at a time when I hadn't done a lot of coding yet, so I hadn't suffered the pains of sharing code *that* much, but it was already relatable enough to understand the implications of version management. Emailing those zip files of code had been plenty annoying.

This happened just before we embarked upon a 6-month journey in college to write an inventory management application in java. There were four of us. We had done java before, but this was the first "big" application that also employed our newly acquired knowledge about databases. God only knows why, but they actually ran (and probably still do) an Oracle server at school for these projects. I don't know how much an academic license runs on that, but it's in the thousands of dollars no doubt. As the initiative taker of my group, I persuaded (not that it took much effort) the guys to forget about the Oracle server (which didn't come with any developer tools, just the awful command line interface) and use PostgreSQL (we rejected MySQL just to be on the safe side, as it lacked support for a bunch of things like views and foreign keys). I also introduced another deviation from the standard practice: CVS. I would be running both off of my tried and tested Pentium home server. :proud: We were the only group that used version management, which baffled me (why was it not offered as a service at the department?). I heard that the year after that they started offering CVS for projects. Incidentally, when you deviate off the path, you do expect to get some academic credit for the extra effort, obviously. ;)

So that was 6 months with CVS, and it worked quite well. We didn't do anything complicated with it, didn't know about tags or branches, just ran a linear development on it and that was it. And it was helpful (but not efficient) to track binary files (Word documents, bleh). But it had some annoying quirks. Not that we were all that concerned with tracking history back then, it was just about getting it finished and then obviously it would never be used, as school projects go. But the lack of renaming and moving in CVS was silly.

It's a bit funny, actually. CVS has been a standard for the longest time, and people have put up with its problems, until recently when a version management boom began. Why is that? I have a hunch that the relative calm in version management was kept intact by a predominantly corporate culture (and the corporates are obviously super conservative). But then once a certain number of people had gotten themselves into open source projects the need for better tools put this in motion. One of the first "new generation" systems was Subversion, the replacement for CVS. Subversion was adopted slowly, but quite steadily. Currently it's probably the "standard" for version management. I registered galleryforge with Berlios precisely because they offered svn services. Sourceforge also has it now, and people around the net have started talking in terms of svn a lot. My current school also uses it.

But with the momentum of version management systems in play, a lot of other ideas have surfaced besides those captured in Subversion. Arch and Darcs are two that employ the distributed paradigm. I don't think they have had much success though, and perhaps looking back they will have played the role of stepping stones rather than end products. A new (a third, if you will) generation of systems has appeared quite recently. Monotone is a newer system with its own network synchronization protocol. Mercurial seems to have enough support to replace Subversion/CVS down the line (not that it couldn't happen today, but people hate giving up something they're used to, so these things drag out). And there is Bazaar, which seems to discard the goal of being the best system and just aim to be easy and pleasant for small projects.

And somewhere in that mix we find Git, except that it has a much higher potential for success than other new systems, because it was launched into the biggest (or noisiest, at least) open source project and made to carry the heaviest loads right from the beginning, which is a good way to convince people that your product is robust. As such, it seems to me that git has had the quickest adoption of any system. It was launched in 2005 and it has already become one of the household names around the interwebs. So therefore I thought git is worth looking at. At this rate I may very well be using it on a project someday.

I was also encouraged by regular people saying it worked for them. A version management system for kernel developers is nice, but that doesn't necessarily make it right for everyone else. But here were people with one-man projects liking it, good sign. In fact, one of the things I used to read about git was "I like the simplicity".

What I discovered was that my dictionary was out of date, because it didn't carry that particular meaning of the word "simple". Git is not simple, it takes getting used to. It's one of those things you try and you don't understand, then you come back and try again and you make a little more progress and so on. I decided to open a git repo for undvd, both because I need to track the changes and so I can use git and get a feel for it.

The special thing about git is that it has well developed capabilities for interoperating with other systems. In other words, git doesn't assume it will wipe out everything else, it accepts the reality that some people will continue using whatever it is they're using (even CVS) and git can deal with that. Obviously, predicting the future is tricky, but I get the feeling that Subversion has hit that comfort spot where most people are satisfied with what it's doing and they really would need a lot of persuasion to learn something new. In other words, I expect Subversion to be around for quite a while. And therefore, git's svn tools may come in very handy. I haven't really looked at that part yet, but it's on my todo list.

So what is the future? I would say Subversion, Git, and maybe Mercurial. Version management systems are like programming languages - the more of them you use the easier it is to learn a new one. But switching is hard. When I'm thinking of checking something in version management I still begin typing svn st.. before I realize I'm using git.