Archive for the ‘english’ Category

making the spam bots feel comfortable

June 1st, 2007

A lot of famous people have said lots of interesting things about success. Little did I know the success I was about to experience when I opened this blog in 2003. It is today what it's always been, an outlet mostly for various gripes, observations, recent events etc., of no interest to anyone. Why blog? Because I feel like it (bad guys in movies always say "because I can"). And I also thought that in time I would find it amusing to read back old entries, relive the past so to speak, which I actually don't really do.

But it turns out that this blog does have a wide appeal after all. Why is anybody's guess. In the last 6 months or so interest has intensified to the point where I get 1,000 comments a week. The absolute majority of these are friendly, well meaning, helpful spam bots who want to make sure I hear about the best deals that can be made. Whoever said machines aren't friendly?

So, as more and more spam bots have found my blog and spread the word to all their friends, it's become increasingly important to make sure I'm a good host to this populous demographic. My spam bot friends have a lot to say, and they won't stop at arguing points for topics that were covered here a long time ago. The Wordpress community has helped me ensure that while I don't miss out on any spam comments, the human readers on here, who don't know my bot friends, and don't appreciate their intelligence and sense of humor, only see the human content.

Wordpress ships with Akismet, and you want to turn it on right away. But to decrease the number of spam comments, you may want to consider the Bad Behavior plugin in addition. It blocks certain types of traffic outright, not just posts but also pageviews.

The last two weeks I experimented to see whether it makes much of a difference. First I ran Bad Behaviour for a week, counted how many spam comments I got and how many were blocked. Then I ran for a week without it and counted the spam comments.

May 17-25
832 spam comments

866 spam comments blocked by Bad Behavior (out of a total 1,196 requests blocked in total)

May 25-June 1
1,320 spam comments

In both cases, all spam was caught by Akismet, so nothing actually gets published. But the difference is in how much spam is submitted and ends up in Akismet's temporary 15-day archive.

Conclusion: Bad Behavior is worthwhile. :)

play audiobooks on crappy mp3 players

May 30th, 2007

Portable music players are to me a very welcome addition to our lives. A lot of time that was previously completely wasted can now be exploited. All the waiting is so much better, waiting for the bus, waiting at the post office, waiting to cross the street. Oh sure, we had portable players the last ~25 years, but the reason they've become so universal lately is how convenient it's become to use them. Yes, for something to be practical matters a great deal, there's hardly a better example.

And yes, mp3 players are great... for music. Just like pens are great for writing letters. But that's not all you can do with them. Hey, after all it's an audio player, not a music-only player. And hey, music is great when you're on the road. But when you're out there a long time, no matter how much music you have, it does get a little boring. So how do you pick up that slack? Why not try an audiobook.

I actually don't play much music on my player anymore, anytime I'm outdoors I use it for spoken audio. Unfortunately, portable players generally suck for this. It's as if noone thought of it, gee what if someone wanted to listen to something longer than 5 minutes?

Obviously, audiobooks tend to be longer than songs. Depending on how it's divided up into tracks, it's sometimes very inconvenient to play them. If you miss 30 seconds of a song, it doesn't really matter. But if you're listening to prose and you get interrupted, you want to seek back those 30 seconds to hear what you missed. For audiobooks, easy seeking in tracks is pretty important.

My old iRiver ifp series used to choke on tracks longer than about 30 minutes in length. It would play the track, and go past this limit, but the duration on the display was now out of sync with the audio. For these long tracks seeking was completely broken past this limit. Unless there's been a firmware upgrade in the last 6 months, this is still the case for everyone. Not only that, the seeking function was pretty much the weakest part of the interface, it was very impractical and very often I would accidentally skip to the next track instead of seeking forward (holding the button vs pressing). Incredibly annoying.

This seems to be the trend in general, seeking is a marginal thing, no one is making it easy to use on their portable player. Another inconvenience is that some players have displays so small that long artist/title/album names are a pain to check, it takes forever to scroll them.

So what to do? Well, you can hack around it. It's not an elegant solution, but it's a solution. Divide all these long tracks into short tracks, so your player won't choke on them, and so that you only go 5 minutes back if seeking isn't reliable.

tracksplit.rb will do just that. Run it in the directory where you have your longish tracks (it accepts mp3/ogg) and it will chop each one into pieces for you. It also renames them sequentially (so I assume you know what tracks you have), so the order in which they were alphabetically is preserved. The originals are deleted (after all, this is just a copy for your portable player, right?).

The actual heavy lifting is done with mp3splt, which cuts tracks into pieces without re-encoding. :cap:

#!/usr/bin/env ruby
#
# Author: Martin Matusiak <numerodix@gmail.com>
# Licensed under the GNU Public License, version 2.
#
# Note: this script uses mp3splt to split mp3/ogg files at a set length.


# set the length in minutes for each track, eg. 5.0 = 5 minutes
$track_length = 5.0


if ARGV[0].nil?
	$duration = $track_length
	puts "Track length not given, using standard track length of #{$track_length}"
else
	$duration = ARGV[0].to_f
end

if not system "which mp3splt &> /dev/null"
	puts "Erratum: mp3splt not found on system"
	exit 1
end

$pattern = "*.{mp3,ogg}"
files = Dir[$pattern]

if files.empty?
	puts "No files named \"#{$pattern}\" found, exiting"
	exit 0
end

w = (files.length / 10) + 1
files.each do |file| 
	i = files.index(file)
	newfile = "%0#{w}d_@n" % i
	cmd = "mp3splt -t #{$duration} -o \"#{newfile}\" \"#{file}\""
	puts cmd
	if system cmd
		File.delete(file)
	end
end

Ps. This happens to be my first adventure with ruby, so report breakage please ;)

remember Prince of Persia?

May 26th, 2007

Yees yes, that fantastic game we played in the early 90s. Remember it? Of course, who could forget. :cap: Getting that prince unscathed through all those tunnels, traps and past all his enemies was great fun.

Ah, the days when Ms-Dos was our operating system of erm... choice and we gave the keyboard a good workout.

prince_of_persia.jpgPrince of Persia kept reappearing in new releases, but as far as I'm concerned once you kill the classic 2D game play feel, it's all downhill. The story lines for the later versions were also tediously complicated, nothing like the elegant simplicity of the classic. The original and the sequel, those are the two I played back then.

And now you can re-live the experience yourself. The Unofficial Prince of Persia Website has all the scoop, with plenty of extras. The oldest versions (1 & 2) are considered abandonware (which means no one is there to collect) and are up for download from the site. There's also cheats and walkthroughs if you get stuck (ah, how much easier it is to play these games nowadays when you don't have to figure it out yourself :D ).

Not only that, the walkthroughs for the sequel even have captures on google video, so you don't even have to play it yourself. :D

"But wait a minute", you say, "didn't you say Ms-Dos? How am I going to play these games? I've moved on from Dos by now." Funny you should ask. There is a Dos emulator called dosbox, which gives you a window into Dos, if you will. Inside there you can play any Dos game, and dosbox has a pretty long list of supported games.

Enjoy Prince of Persia! :cool:

the "print version" - a misnomer

May 24th, 2007

There was a time in the earlier web when you could open a web page, read it, and print it. Pages were mostly text, with some markup, and sometimes illustrations. With the popular use of frames, this convenience was lost immediately, as a content frame inside a frameset wasn't printable unless you did something voodoo-like as opening it in a new window. (And even then, the "fancier" sites had javascript to detect this and would restore the frameset for you, how helpful.)

Granted, today frames aren't popular, they went out of style with the Spice Girls. But, over the years, with the adoption of ever more useful sophisticated technologies, content on the web was rendered generally unprintable. Even so, we still wanted to print content, occasionally. In what is an instructive case of solving a problem by introducing a new problem and having to solve both, the answer was the print friendly version.

As it turned out, the print version came to be the readable version. With the web boom, and the consequent dot com burst, ad revenue went from easy-to-get to omg-we're-going-under. To stay afloat, sites grew ever more ad infested. What used to be a matter of principle (okay one banner at the top, but nothing in the content area), were scruples melting away to nothing. Even a few years ago, it was possible to open a website, ignore whatever was on the top, left and right, and read the content in the middle undisturbed. Not so now, lots of sites have both banner/flash ads right in the text, as well as the horrendous ad words, words/phrases in the text highlighted with a popup window when you mouse hover. Another popular gimmick among publishers is to divide up the text on several pages (under the pretext that the content is just too long, you don't want to load all of that at once, you'd rather load our ads), so you have to page your way through while reading and load new ads on every page. If you try to print this, you get snippets of the story with the same header and footer you don't care a toddle about (and all the ads included).

All of this basically makes quite a few sites unusable, at least without Adblock and Flashblock. Take Extremetech, which integrates all these awful gimmicks. They sometimes have good in-depth content, but would any human being want to read this article? Not only is the site ad infested like the worst of them, it also divides the story up into 11 pages. :lazy: Salvation? The print version. The print version is actually continuous, you can read it without having to click yourself through. And it has less ads in it. As it is, the print version is actually the only readable version many sites offer. It's not for print, it's for reading. It's become such a reflex that I always look for the print icon, and if it's not there I bid farewell.

In contrast, this makes blogs wonderful to read. Blogs with a nice design, little to no ads and good typography (and there are lots of these) are a delight, because they don't have all these things that prevent you from reading the content. Blogs are much more fun to read that papers and magazines.

open source and lacking communication channels

May 22nd, 2007

Open source projects understandably face a lot of challenges. One of those challenges is to communicate effectively with the user base. It is my claim that many of the common channels of communication are ineffective.

If you're a developer, put yourself in a user's place. Users have questions. Sometimes they have requests. Some projects go the extra mile to establish a channel for quick, uncommitted communication. But most projects don't do this, they rely on the classical communication channels.

Mailing lists are the way most projects exchange ideas between developers. They are archived, which makes them good for keeping track of what has been covered. But in order to post a single question to a mailing list, you first have to subscribe to it. Think about this for a minute. You're not a developer, you just want to ask one question, but in order to do that, you have to commit yourself to receive all the emails exchanged on the list. I have faced this problem countless times. Currently I'm on the cmake list, because I needed to ask a question about a month ago and I thought why not just remain there in case I have something else to ask in the near future.

This is completely pointless. Even if I just want to ask one question, I have to subscribe to the list, validate my email address, sometimes even await moderator approval (for some reason or another, joining a mailing lists always seems to take half a day between subscribing and being subscribed), post the question, wait for the answer, receive all kinds of irrelevant emails, see if I'm satisfied or I ask a follow-up, get more irrelevant emails, then finally unsubscribe. In the past couple of years, I've been through the subscribe-unsubscribe loop at least 20 times, usually with no lasting interest in the project. Not that it's a big problem, my gmail account is basically unlimited storage, and with the filters I don't have to look at these emails at all (which I don't). But cmake's mail server still keeps sending them even though I don't want them. It's a waste.

Naturally, there are reasons why it works like this. Lists have to protect themselves from spammers and trolls, but it doesn't make the user role any more fun. And even though mail archives are nice for developers to keep track of their ideas, more often than not they suck for finding information. A lot of mail archives don't have a search function. Of course, that begs the question why even have an archive then? In one or two cases I actually had to download all the monthly archives and grep them locally. And that's when the archive actually was downloadable.

Another common channel is irc. Great for irc people, less amazing for the rest of us. Irc demands that you find yourself at the channel at same time that the person who can answer your question is also there. If not, you can hang around all day, and pay attention to the channel, so that you don't miss it when the person is there and can help you. Some people probably have that kind of patience, but I don't.

Again, irc can be archived, but it's a very noisy medium. In between actual technical questions, there tends to be a lot of chatter. Nothing wrong with that, but it makes it less than ideal for effective communication.

An increasingly popular channel is the forum. Forums work really well, if they get enough traffic. Unlike irc, they aren't real time so it doesn't matter when you're there. And they are far more organized than mailing lists too. Again unlike mailing list, the sign-up procedure tends to be speedy. And again unlike mailing lists, it's non-committed: you can post a question today, get the answer, and never visit the website again. Your account remains, but there's no stream of emails coming after you. And forums are searchable.

Forums have their own problems. If they aren't sufficiently active to get your questions answered, they are useless. And they require more moderation too.

A non-interactive, yet still highly effective channel is the wiki. Wikis are fantastic for ad hoc, unofficial information. And their informal, editing-oriented feel encourage updates. Superb for all kinds of tips and frequently asked questions.

But, unless unrestricted (and thus open to spam), not useful for informal interaction.

This isn't to say that existing, well established channels are obsolete. Gmane certainly gives you more out of mailing lists, with a threaded view and searching. But the fact remains that many of the tools in use today are not optimal, and in some cases, very ineffective.

I didn't mention bug trackers, because they are quite effective at what they set out to do, and because they aren't quite general purpose either.