Archive for the ‘technology’ Category

clocking jruby1.1

April 21st, 2008

Did you hear the exciting news? JRuby 1.1 is out! For real, you can call your grandma with the great news. :party: Wow, that was quick.

Okay, so the big new thing in JRuby is a bytecode compiler. As you may know, up to 1.0 it was just a Ruby interpreter in Java. Now you can actually compile Ruby modules to Java classes and no one will know the difference, very devious. :cool: Sounds like Robin Hood in a way, doesn't it?

The JRuby guys are claiming that this makes JRuby on par with "regular Ruby" on performance, if not better. Hmm. Just to be on the safe side, what size shoes do you wear? Oh ouch, those are going to be tricky to fit in your mouth. :/ And Freud will say you're stuck in the oral stage. Too much? Okay.

So here is my completely unvetted, dirty, real world test. No laboratory conditions here, you're in the ghetto. First we need something *to* test. I don't have a great deal of Ruby code at my disposal, but this should do the trick. How does scanning the raw filesystem for urls sound? The old harvest script actually does a half decent job of turning up a bunch of findings.

Now introducing the contenders. First up, his name is JRuby, you know him from occasional mentions on obscure blogs and the programming reddit past the top 500 entries. He promises to free all Java slaves by giving away free Rubies to everyone!

Aaand the incumbent, the famous... Ruby! You know him, your parents know him, every family would adopt him as their own child if they could. He's the destroyer of kingdoms and the creator of empires, he's bigger than Moses himself!

Our two drivers will be racing across a hostile territory. Your track is a 25gb ext3 live file system. During this time, I can promise you that only Firefox is likely to be writing new urls to disk, but I could be lying eheheh. Due to the unpredictable nature of this rally track, regulations allow only one racer at a time, but you will be clocked.

First up is the new kid on the block Jay....Ruby. The Ruby code will not be compiled before execution, we'll let the just-in-time compiler do its thing.

$ time ( sudo cat /dev/sda5 | bin/jruby harvest.rb --url > /tmp/fsurls.jruby )
real 39m26.547s
user 37m19.072s
sys 1m28.406s

Not too shabby for a first run, but since this a brand new venue, we have no frame of reference yet. Let's see how Ruby will do here.

$ time ( sudo cat /dev/sda5 | harvest.rb --url > /tmp/fsurls.ruby )
real 78m42.186s
user 62m12.537s
sys 2m18.721s

Well, look at that! The new kid is pretty slick, isn't he? Sure is giving the old man a run for his money. Let's see how they answered the questions.

$ lh
-rw-r--r-- 1 alex alex 86M 2008-04-21 18:29 fsurls.jruby
-rw-r--r-- 1 alex alex 8.6G 2008-04-21 20:58 fsurls.ruby

Yowza! No less than a hundred times more matches with Ruby. What is going on here? Did Jay just race to the finish line, dropping the vast majority of his parcels? Or did father Ruby see double and triple and quadruple, ending up with lots and lots of duplicates? Well, we don't really *know* how many urls exist in those 25gb of data, but it seems a little bit suspect that there would be in excess of 8gb of them.

One way or the other, it's pretty clear that the regular expression semantics are not entirely identical. In fact, you might be sweating a little right now if your code uses them heavily.

UPDATE: Squashing duplicates in both files actually produces two files of very similar size (13mb), in which the disparity of unique entries is only a very reasonable 4% (considering the file system was being written to in the process). The question still remains how did Ruby produce 8gb of output.

what the heck is a closure?

April 20th, 2008

That's a question that's been bugging me for months now. It's so vexing to try to find something out and not getting it. All the more so when you look it up in a couple of different places and the answers don't seem to have much to do with each other. Obviously, once you have the big picture, all those answers intersect in a meaningful place, but while you're still hunting for it, that's not helpful at all.

I put this question to a wizard and the answer was (not an exact quote):

A function whose free variables have been bound.

Don't you love to get a definition in terms of other terms you're not particularly comfortable with? Just like a math textbook. This answer confused me, because I couldn't think of a case that I had seen where that wasn't the case, so I thought I must be missing something. The Python answer is very simple:

A nested function.

It's sad, but one good answer is enough. When you can't get that, sometimes you end up stacking up several unclear answers and hoping you can piece it all together. And that can very well fail.

I read a definition today that finally made it clear to me. It's not the simplest and far from the most intuitive description. In fact, it too reads like a math textbook. But it's simply what I needed to hear in words that would speak to me.

A lexical closure, often referred to just as a closure, is a function that can refer to and alter the values of bindings established by binding forms that textually include the function definition.

I read it about 3 times, forwards and backwards, carefully making sure that as I was lining up all the pieces in my mind, they were all in agreement with each other. And once I verified that, and double checked it, I felt so relieved. Finally!

I can't follow the Common Lisp example that follows on that page, but scroll down and you find a piece of code that is much simpler.

(define (foo x)
	(define (bar y)
		(+ x y))
	bar)

(foo 1) 5 => 6
(foo 2) 5 => 7

What's going on here? First there is a function being defined. Its name is foo and it takes a parameter x. Now, once we enter the body of this function foo, straight away we have another function definition - a nested function. This inner function is called bar and takes a parameter y. Then comes the body of the function bar, which says "add variables x and y". And then? Follow the indentation (or the parentheses). We have now exited the function definition of bar and we're back in the body of foo, which says "the value bar", so that's the return value of foo: the function bar.

In this example, bar is the closure. Just for a second, look back at how bar is defined in isolation, don't look at the other code. It adds two variables: y, which is the formal parameter to bar, and x. How does x receive its value? It doesn't. Not inside of bar! But if you look at foo in its entirety, you see that x is the formal parameter to foo. Aha! So the value of x, which is set inside of foo, carries through to the inner function bar.

Can we square this code with the answers quoted earlier? Let's try.

A function whose free variables have been bound. - A function, in this case bar. Free variables, in this case x. Bound, in this case defined as the formal parameter x to the function foo.

A nested function. - The function bar.

A lexical closure, often referred to just as a closure, is a function that can refer to and alter the values of bindings established by binding forms that textually include the function definition. - A function, in this case bar. That can refer to and alter, in this case bar refers to the variable x. values of bindings, in this case the value of the bound variable x. established by binding forms, in this case the body of the function foo. that textually include the function definition, in this case foo includes the function definition of bar.

So yes, they all make sense. If you understand what it's all about. :/

Let's return to the code example. We now call the function foo with argument 1. As we enter foo, x is bound to 1. We now define the function bar and return it, because that is the return value of foo. So now we have the function bar, which takes one argument. We give it the argument 5. As we enter bar, y is bound to 5. And x? Is it an undefined argument, since it's not defined inside bar? No, it's bound *from before*, from when foo was called. So now we add x and y.

In the second call, we call foo with a different argument, thus x inside of bar receives a different value, and once the call to bar is made, this is reflected in the return value.

Well, that was easy. And to think I had to wait so long to clarify such a simple idiom. So what is all the noise about anyway? Think of it as a way to split up the assignment of variables. Suppose you don't want to assign x and y at the same time, because y is a "more dynamic" variable whose value will be determined later. Meanwhile, x is a variable you can assign early, because you know it's not going to need to be changed.

So each time you call foo, you get a version of bar that has a value of x already set. In fact, from this point on, for as long as you use this version of bar, you can think of x as a constant that has the value that it was assigned when foo was called. You can now give this version of bar to someone and they can use it by passing in any value for y that they want. But x is already determined and can't be changed.

how we love meaningless ‘facts’ about security

March 31st, 2008

Consumers like simple answers. In fact, they insist on them. When I was shopping around for an espresso maker, I knew that I know nothing about the subject. I also didn't care to learn about it just so I can pick out the right machine, it's unlikely to be a worthwhile investment. So when I went to the store and started glancing over all the different machines, and the salesman comes up to me, all I really wanted to know is which is the best one? This is the way consumers think. You can give the whole run down of specifications and they will still want you to tell them which one is best. The guy will dance around the issue a little, "well it depends on what you want etc" but eventually he will converge with your viewpoint, because he knows what you want to hear. You want him to tell you which one to pick. It doesn't even matter if he tells you the truth, you just want an excuse so that you don't have to think about it. If it later turns out that he was lying, well I guess I'll have to bite the bullet and do my own research next time.

This attitude demonstrates that we want simple answers to complex questions. Just get the answer and have your peace of mind already, it doesn't matter how accurate it really is.

As technologists, one of our favorite issues is security. People get passionate about security, they have long discussions about it and they're so keen on the latest in security development - in the magazines, on the blogs, everywhere. But it's really just entertainment. They don't actually understand the issues or even want to learn about them, they just want to have the simple answer.

Ironically, security is particularly badly suited to such black&white perception on reality, as it is one of the most complicated aspects of our technologies. Nevertheless, you will often see stories like this, about cracking 3 laptops running OS X, Vista and Linux respectively. Apparently, the Mac was popped first. Now, the reporter of this story will not declare that this test makes Linux the safest platform. Such a conclusion would be completely unfounded. But not saying it is actually not very far from saying it, because if that wasn't the point of this exercise, then what was?

The fact is that the issue of security is much more complicated than most people want to deal with. They just want a smiling salesman to pat them on the back for making the right choice. Consumer IT security is the car sales of the industry.

The other thing is that security is a very delicate issue in and of itself. It isn't about the general quality of a system, it's merely about finding the one weakness in an otherwise perfect system and that can be enough to compromise the whole thing. This makes it distinct from many other facets of computer systems, where 95% is a great score (on say, ease of use). Security is about 100% coverage, non-negotiable. The way to achieve that is to run as few applications as possible, allow as little incoming communication as possible, and keeping a close watch on everything on a day-to-day basis. Which is exactly what desktop users want out of their systems, right?

Ultimately, the stakes aren't high enough to have secure desktops. There is actually software out there that literally does not break, does not crash, never misbehaves at all. The first place to look for something like that would be NASA, where software bugs have enormous financial consequences. Companies also have much better security than you and I do. Companies are conservative, they will stick with a system for 10 years if it runs reliably, no matter how ugly or annoying it may be. But then again they are liable for losing/leaking/corrupting lots of important data other than their own, so they like to be careful about it. We don't have that burden. The worst we can do is lose our own data, which typically means copying it back from a usb drive or something, no big deal.

People get riled up about desktop security when there is no desktop security. All you have is pockets of time where no exploits are found, but then the next one comes along. Servers are a lot more secure, because they obey strict guidelines on when to upgrade software and on what conditions. Tinkering is set to an absolute minimum. Servers also have strict policies on what types of access they allow, and to whom. This is why servers get compromised far less than desktops.

The mentioned report indicates that the Mac was hacked immediately, which probably means there's a glaring exploit out there right now. Then the Vista box survived 3 more days before it was brought down. Run that test next month and the results are likely to be completely different. Treat these tests statistically and all you get is a bunch of exploits moving around such that at any given moment there are a number of exploits available on every platform. If hacking a Linux box is 10 times harder than a Windows machine, then that doesn't really do much for us, as the past decade has shown that compromising Windows machines is a no brainer for a motivated person. That means instead of 2 days you'll survive for 2 weeks, little comfort. Now if it were 100 or 1000 times more difficult, that could actually make you feel better, but the difference is unlikely to be that big.

The desktop is practically the most insecure platform in use today. You can run anything you want on it and change your entire software stack everyday if you want to. Who's gonna stop you? From a security point of view, this is completely untenable. You cannot give people the freedom to run whatever they want while at the same time enforcing a strict security policy, those two are mutually exclusive. If you want better security, you'll have to put up with more pain, and you'll have less freedom. The quoted article says that the Vista box was compromised through a bug in the flash plugin. Now can you really blame Microsoft for bad code in Adobe's godawful flash plugin? Flash is the perpetrator here, not Vista. Okay, so they could be stricter in what things they allow and what not. But that would either partially break the flash functionality (or other software) or render it entirely incompatible. Would you prefer that? Of course not.

The truth of the matter is that technology advances at a fast pace, and we love to be part of that. We will happily run beta code as long as it's easy to install (Firefox) and doesn't burn us too much. But you can't combine the incessant thirst for the newest software with any kind of reasonable security model. It's easy enough to tell the server: Only these 5 applications are allowed to run here. But you can't do that on a desktop, because the user might want to do anything and everything. There is no security, because there isn't nearly enough security auditing happening. qmail hasn't had a security hole in 20 years, but then it's hardly received any updates in years. Would you be satisfied running Firefox 0.8 today? I doubt that. But that sort of longterm and meticulous verification is what it takes to examine a piece of software in detail and make absolutely sure that it doesn't have any issues.

Granted, there are very different attitudes toward security in various places. Unix was designed as a multi user system, and therefore security was one of the guiding principles. No user should be able to mess with another user's data and no user should be able to bring down the system. Microsoft never designed their stuff for security (apparently it didn't seem to matter) and had a nasty backlash when they found out to their astonishment that people did care about the issue. In recent years, they have tried to take the matter seriously, but it seems like they're still perplexed by it.

Now that the media seems poised to regularly stack up Windows, Os X and Linux on "security", this topic isn't going away. There will be many more "head to head" tests, which based on their criteria (sometimes more sound, sometimes less) will indicate a so-called winner, in a contest that ultimately doesn't measure anything useful. If the test was to pit the kernels of each operating system against each other, then that could be insightful. But who apart from kernel developers cares about the kernel? Kernel security bugs is a class of bugs to be taken seriously. But far more security holes are uncovered everyday in common applications that we really want to use. Applications that have nothing like the scrutiny of a kernel.

Our systems are insecure (some more, some less, but ultimately they all have weak points) because we care a lot more about new software than about security. A system that gets cracked in a week is not a secure system. One that can withstand years of attacks, now that's more like it. Our desktops are of the former category.

gdm sloppiness

March 28th, 2008

Today's example sponsored by gdm. Say that you have a certain session (gnome, kde, fluxbox, whatever) and you're experimenting with another one which isn't working quite smoothly yet. Then you'll be stuck going back and forth a few times. And you'll probably see this dialog:

gdm1.png

The Ubuntu gdm theme is nice and clean and it's easy to figure out how to change the session. This dialog does the job without much ado. But then you find this:

gdm2.png

After you've changed the session, assumed that the change succeeded, stopped thinking about it, and moved on to start the session by logging in, you get this idiotic dialog.

This is horrifying in several ways. First of all, the gdm login screen is completely clean of any dialogs, so there is no hint given that you should expect a popup. Secondly, once you've set everything using the secondary controls at the bottom of the screen, you just want to login and be on your way. When I'm in that mode, I've basically learnt to hit Enter as many times as it takes to get me through, so I'm very likely to accidentally accept the dialog since I don't know it's coming.

And finally, the question of whether to make the session the default one is completely cut out from the menu for changing the session, which shows a complete lack of consistency. Here I'm done doing something and later on I have to answer unexpected question about something I already finished.

Not to mention that the "unsafe" choice is selected by default, I might accidentally change my default session just by clicking Enter twice after putting in my password.

Worst of all, even when I know that the popup is coming, I absolutely do not want to have to answer it again and again just because someone couldn't figure out a better place to put that option. Make it a checkbox on the previous dialog, that's what everyone else does, why must you be so special?

I'll be nice and I'll just call this sloppiness.

EDIT: Bug filed.

UPDATE: Bug fixed in gdm 2.21.

buh-rilliant!!!

March 27th, 2008

Remember how Ubuntu came out of nowhere and just like that made everyone else feel like they're lagging behind? That's really what makes Ubuntu stand out, there is a real understanding of user needs in their leadership. The past couple of years they have empowered so many people who were interested in Linux but just didn't know how to get started or fix common annoyances (like lack of media codecs, say). And that policy hasn't gone unnoticed, I certainly feel like Fedora, for example, is doing a lot better job at embracing a wide audience than before Ubuntu ever came to light.

The latest in Ubuntu reiterates their ability to empower their users. If you're a Ubuntu user you probably know about something called 'Personal Package Archives' (ppa). It is currently the designated method of installing kde4, until it goes mainline. Well guess what, Launchpad now offers a ppa to every user! How's them apples.

This means you now get your own little apt repository you can use, and offer your packages through the same mechanism as any officially supported package, without resorting to .debs and custom "here's how you install it" instructions. Fabulous! :star:

Here's my shiny new PPA:
https://launchpad.net/~numerodix/+archive

For the time being I'll only be keeping undvd packages there.

Unfortunately, debian packaging is something of a cult and not the easiest thing to get involved with. They are nazi about following guidelines to-a-t and therefore wrapping up a .deb takes considerably more time than writing an .ebuild or building an .rpm. I appreciate the care that goes into it, but I wish they would find a more efficient mechanism for it. The debian/ directory should be more of an abstraction, not actually having to go and hand edit the files in there, that's silly.