tahple or twople?

August 21st, 2008

The word tuple is used quite a lot in computing. That's what database people call a row in a table. It's also what several programming languages call a structure where the fields are ordered but not named.

It seems to be one of those words that is hard to translate, so other languages often use the English word. And yet there is some confusion about pronunciation. Some say tahple, some say twople. As far as I know there is no dispute about the spelling, it's tuple. So where do you get twople from that?

I think having a lot of exceptions on pronunciation from what is the obvious pronunciation is bad for language. There are words that are fancy or interesting enough to perhaps deserve it, but tuple isn't one of them. So I'm going to keep saying tahple.

Beautiful code

August 16th, 2008

I don't remember who metioned this book or where they did it. I seem to remember it being mentioned by several people. But for one reason or another I decided to order it and I've eventually made my way through to it.

"Beautiful code" is a compilation of 30-something case studies, each chapter written by a different contributing author, describing code or systems they found beautiful. I suppose it is subjective how wide your definition of "beautiful code" is, but some authors describe architectures rather than code, which isn't quite what I'd expect. To me "code" is generally something that happens at the statement/function level, otherwise you call it "design" or "architecture".

The case studies are extremely diverse, you have everything from kernel code to high level systems. As I'm not a kernel hacker I have to say I didn't understand much of the chapter on Linux drivers, but then I get the feeling I'll never grok c types without a mentor or something, the Hungarian notation style variable naming tells me little about their meaning. There's a FreeBSD chapter on filesystem layering, and that's fairly straightforward, then there's a Solaris chapter on thread handling which is interesting, but the code unfortunately is less instructive to me than is the prose (the author's fascination with sewage is also mildly disturbing).

You'll find the code examples in a variety of languages, some familiar (c++, haskell, java, python, ruby), some not directly familiar but partly or mostly understandable (c, c#, javascript, perl, scheme), and some foreign (elisp, fortran, matlab, visual basic). There are two chapters showing implementations of python datastructures (in c) that I found quite interesting, one from the standard library (dictionaries), the other from NumPy (n-dimensional arrays).

It turns out this book is more interesting than I expected. Some of the chapters I'm just not in a position to understand, but many of them are well written and interesting to delve into. I successfully killed 4-5 hours of time in flight and at the airport with it, which is better mileage than I get out of books on tape. What I really like about it is that it's a book for hackers in the trade -- it's a book that shows you stuff, not one that tries to teach you. Which means you get right to the point without the obligation to introduce and prepare you for what you're about to read. It's a lot more like reading a blog.

So then there's the question, is the code that these supposed masters of the trade write more beautiful than yours and mine? Well, not necessarily. In some of the examples presented it's the design that's supposed to make it beautiful, not the code itself. And try as you might to imagine how an expert will wield untold levels of voodoo to problems you and I would love to solve better, most of the time they don't. I guess there isn't all that much hidden magic out there.

Norwegian is the best language, yo

August 14th, 2008

Quick, what's the most important quality a foreign language can have? If you said "easy to use" you'd be right. All other concerns are trumped, because other values of a language can never be appreciated unless you can learn it first. And apparently Norwegian ranks first on ease of learning for speakers of English (fun to know :party: ). The ranking is of course highly unofficial, but what the heck. :cap:

Exhibit A:

Scandinavian verbs have some of the easiest conjugation you can find in Europe. Present tense is made by adding an -r to the verb, regardless of who's doing it. That gives us:

ha - to have

jeg har - I have
du har - you have
han har - he has
vi har - we have

Such simplicity is brilliant (and unheard of). :star:

The full rationale is here. A few selected gems follow.

Norwegians understand 88% of the spoken swedish language
understand 73% of the spoken danish language

Swedes understand 48% of the spoken norwegian language
understand 23% of the spoken danish language

Danes understand 69% of the spoken norwegian language
understand 43% of the spoken swedish language

Norwegians understand 89% of the written swedish language
understand 93% of the written danish language

Swedes understand 86% of the written norwegian language
understand 69% of the written danish language

Danes understand 89% of the written norwegian language
understand 69% of the written swedish language.

Hah, suckers! More succinctly:

"Norwegian is Danish spoken in Swedish"

Norwegian + phonology - vocabulary = swedish

Norwegian - phonology + vocabulary = danish

long passwords are evil

August 12th, 2008

I'm writing this partly in response to Jürgen's post a week or so back about passwords. Of course, he's not the only one to advocate long passwords, a lot of people are doing that these days in the name of security. Today's sad reality is that if your password is not "test" or "password" you are more secure than most people.

I do think, however, that any idea for improvement should stand to be evaluated on usability. After all, my first loyalty is to the user in me. Failing to do that produces wide adoption of bad ideas like captchas that are directly hostile to users. (Incidentally, that's why so many people who build systems for others build them badly. The implication of using it every day never takes a foothold.)

Short passwords have too little entropy, therefore they are easy to break.  Granted. So the response is "use long passwords", or better yet "not passwords, pass phrases". Such as oh bugger, my cat has cancer. With or without the spaces and punctuation it makes a perfectly acceptable password in terms of length. But tell me now who is willing to actually type these monstrosities?

The evil of password typing is reduced by our methods to avoid typing them all the time. Use public keys with ssh, never type the password again. Save passwords in the browser, avoid typing those. It's a fabulous usability gimmick.

But short passwords, bad for security, are great for another closely related purpose: being able to actually type them in. If you have a short password you don't need much practice to be able to type it. It's a sort of sweet spot between usability and security, more secure than nothing, not too painful to type if you have to. My password input rate might be something like 98%. I rarely fail to log in. But with pass phrases of 29 characters like the one above, how confident would you be? You don't see what you're typing either, just echo characters at best. I expect the likelihood of typing it correctly falls dramatically, maybe to as low as 75-80% for the average user, in the average point of his learning curve to learn typing it (does not apply to hackers with stellar typing skills yadayadayada). If you're doing something once, 80% is pretty good odds. But if you're doing it everyday, it's no longer odds, it's a statistical average. Imagine if those were your parking odds. One in five times you fail to maneuver through the opening of your garage, I don't think you'd be happy.

I tested myself on cancer cat just now, 6/10. On a sentence I've never typed before. And that's while seeing the characters on the screen.

And then there's the chance that you'll forget it, or remember it wrong, switch a character in your mind, use the wrong case. It's hard to estimate how likely that is, but with long passwords it seems rather likely. Inputing passwords is not an approximation, it has to be exact. And it's not just one of those phrases you have to remember *exactly*, you need one for every distinct password you keep.

Security is a social problem, not a technical one. If you force people to use long passwords they struggle to input (for christ's sake, they *already* use post-it's on the monitor), we will just embrace ways of avoiding passwords all the more. Passwordless ssh is great, but if I'm using every trick in the book to avoid typing my long password, I haven't had enough practice typing it when I actually have to type it.

That is, if I even remember it correctly. And I somehow doubt sysadmins will give you more tries to type a long password than they currently give you, 3 tries or whatever it is. And then you're locked out.

It's the perfect anti-security. The bad guys have a shot at my account (but they have to be pretty clever), but I myself am locked out.

easy peasy full system backup

August 10th, 2008

You know how when someone accidentally deletes their files or their hard drive crashes or some other apocalyptic event occurs, the first thing people ask is "where is your backup"? Of course, we've all seen it (*ahem* been there :/ ). It's a bit unintuitive, because backups have no equivalent in the real world. If you drive your car into a lake, there's no way to get it back. But making backups is the single best way to prevent losing your stuff. So do it!

Don't backup "my files"

But don't just backup "my documents, my pictures, my whatever". If you computer crashes and you have a backup of "my files", then sure, it's not a total loss. It's better than nothing. But it's not what you actually need. You need the whole thing

This "my files" nonsense is born out of the fact that the delightful company that produced your operating system doesn't want you to be able to make a backup of it. Because if you did, you could make trivial copies of the operating system, and they don't like that idea. Have you ever asked yourself why in 30 years, through all manner of viruses, blue screens of death and hardware crashes Microsoft has never given you or sold you a full system backup program? It's not because they never thought of it. (Or because no one asked for it).

Making full backups is easy

If you've ever installed Gentoo manually (ie. not with one of the automated installers)... yes, the demographic for this one is not immense. But then that's why we're here, to spread the happy message! :happy: Anyway, if you have, then you know immensely easy (this is astonishing especially if you have a Windows background) it is to make a full system backup. In the course of a Gentoo install (and yes, I'm about to reveal the big secret here...*drumroll*), you boot from the livecd, you mount your root partition, you download a tar of a minimal Gentoo filesystem that has your basic /bin/ls and so on, and then you just.... untar it. That's it. No magic, no voodoo, no secret foobared locations on the filesystem that can't be written to, just extract the archive and you're done!

To put it bluntly, this is all you have to do:

tar zcvf backup.tar.gz /

And to restore the backup:

tar zxpvf backup.tar.gz -C /mnt/root_partition

Put that in perspective to the Windows world where a whole industry has sprung up to solve problems that Microsoft deliberately introduced with their *ahem* novel engineering. Idiotic programs like Norton Ghost that you have to get your hands on just to do the same simple thing that you can do with tar on a decent operating system.

Making it more convenient

Granted, you could just use the above tar command, but you may want something a little more convenient. For starters, you may want to skip some files on your file system. The method I use is inspired by a script posted on the gentoo forums a long time ago. I used that script for years without really understanding it, but a while back I decided to rewrite it to suit me better.

Besides just tarring the files it also writes a log file that you can grep to see if some particular file of interest is in the backup, it timestamps the backup with the current date/time and it keeps track of how many backups you want to keep.

Backups are made in a special backup_dir location. This directory is supposed to hold lists of files (recipes, if you like) you want to backup. For example, a simple recipe could be called full.lst:

/
--exclude=/backup/*.tgz*
--exclude=/proc/*
--exclude=/sys/*
--exclude=/tmp/*
--one-file-system

The syntax for the file is that of tar, and it's a list of things to backup. / means the full file system will be included. But certain directories are excluded, /backup because we don't want to include our old backup files in new backups, /proc and /sys because they are virtual file systems and don't contain "real" files, and we don't care about /tmp. Finally, we say --one-file-system, which prevents mounted disks, cds and things like that to be included.

And here is the script that makes this possible. Run it, it will produce a backup file that is compressed. Try to get it below 4.3gb and write it on a dvd+rw, now you have a backup system. :party:

#!/bin/bash
#
# Author: Martin Matusiak <numerodix@gmail.com>
# Licensed under the GNU Public License, version 3.

backup_dir=/backup
num_backups=1


verbose="$@"
lists=$backup_dir/*.lst
ext=tgz
date_params="%Y-%m-%d-%H%M"
nice_val="nice -n20"

# colors
wh="\e[1;37m"
pl="\e[m"
ye="\e[1;33m"
cy="\e[1;36m"
re="\e[1;31m"

if [[ "$verbose" && "$verbose" != "-v" ]]; then
	echo "Usage:  $0 [-v]"
	exit 1
fi

if [ ! -d $backup_dir ]; then
	echo -e "${re}Backup dir $backup_dir does not exist.${pl}"; exit 1
fi


for list in $(ls $lists); do
	name=$(basename $list .lst)
	file_root=$backup_dir/$name.$(date +$date_params)
	
	stdout="1> /dev/null"
	stderr="2> $file_root.$ext.err"
	if [ "$verbose" ]; then
		stdout=""
	fi

	cmd="cat $list | $nice_val xargs tar zlcfv \
		$file_root.$ext $stderr | tee $file_root.$ext.log $stdout"

	trap 'echo -e "${re}Received exit signal${pl}"; exit 1' INT TERM

	echo " * Running \`$name\` job..."
	if [ "$verbose" ]; then echo -e ${ye}$cmd${pl}; fi
	echo -en $cy; bash -c "$cmd" ; echo -en $pl
	status_code=$?

	if [ $status_code -gt 0 ]; then
		# Dump error log
		echo -en $re ; cat $file_root.$ext.err
		echo -en $pl ; echo "Tar exit code: $status_code"
	else
		# Kill error file
		rm $file_root.$ext.err
	fi

	# Evict old backups we don't want to keep
	num=$num_backups
	for evict in $(ls -t $backup_dir/$name.*.$ext); do
		if [ $num -le 0 ]; then 
			rm -f "$evict"
		else
			num=$(($num-1))
		fi
	done

	# Report number of files in backup
	echo -n "$(wc -l < $file_root.$ext.log) files"
	echo ", $(ls -l $file_root.$ext | awk '{ print $5 }') bytes"

done

Worse is better

I've been thinking about how to handle backups most effectively, and it occurs to me that backups are a case of "worse is better". The thing is you could make a really nice and easy application to make backups, but .tar.gz is still the optimal format to store them in. Wherever you are, you have tar and gzip available, and restoring backups usually happens under somewhat constricted conditions, sometimes without network access. So you want to avoid introducing dependencies, it's safer to make do with the tools that are there already.

So it may not be the most elegant system, but it's damn reliable.

Limitations (NEW)

Basically what I'm saying is that if you have no backup system then using tar is a pretty decent system. At the very least it has worked well for me the last 5 years. That isn't to say you shouldn't use a different method if you have different needs.

What about scaling? Well, I think this works quite well up to backups of say 4gb or so. My root partition is using 12gb of space at the moment. The purpose of this method is to back up your working system with all the configuration, applications and so on. Not to back up your mp3 collection, I would exclude that (not least because it's pointless to compress mp3 files and other formats that already are well compressed).

What about the bootloader? (NEW)

Some people have asked how this backup method concerns the bootloader. The answer is that it does backup the files that belong to the bootloader (in /boot). It does not backup the actual boot sector of your hard drive (which isn't represented as a file). So if, for example, you want to restore the backup on another computer (which I've done lots of times), you'll still need to use grub/lilo to update the boot sector.

UPDATE: Apologies to the indignant Windows users. I pretty much wrote this in rant mode, without doing any research and what I wrote about Windows is just from my own experience. I would have been more thorough if I had known this would make the frontpage of digg and draw so many hits.