Archive for the ‘technology’ Category

what I like about Firefox2

September 7th, 2006

Firefox 2 is in beta currently, beta2 release candidate 2. So that means it's going to be some weeks before v2.0 goes gold for public consumption. I was a little hesitant about installing the beta, knowing that in the past upgrading Firefox has brought with it bugs and certainly shut off popular extensions.

What most attracted me to it was whispers that the memory leaks have been greatly reduced, and performance improved. Firefox (originally Phoenix) started out as a nice stripped down version of Mozilla. Then gradually, it accepted so many features that it became rather heavy, especially on slower machines. It is a good thing that performance is being addressed continually, even though I doubt Firefox will ever match up to Opera in that regard. I haven't run any benchmarks to verify, but the new Firefox does seem a little faster. Noticeably, scrolling is faster.

A welcome new addition is a built-in session saver. I have been using the SessionSaver extension since the dawn of man basically, finally it is a built-in feature. When you close Firefox (or it crashes), it will restore your tabs and pages opened in them (new option in the settings).

Tab handling has improved. Every tab has a close icon, but you can also close tabs by middle-clicking on them. I thought this was a bug to begin with, I accidentally closed a window while filling in a form, but it's just a quick way of closing tabs. If you do close a tab accidentally, use History > Recently Closed Tabs to bring it back.

A new feature is a built-in spell checker (supporting a range of languages) for all form input. It highlights typos as you write, with the familiar red underline. While this isn't something I consider a major breakthrough, I'm sure a lot of people will love it.

Finally, a bug fix. in Firefox1.5, when you have the bookmarks drop-down menu open and you scroll with the mouse wheel, it will cycle between tabs. In Firefox2, it does the logical thing - scrolls the drop-down menu.

Extension wise, Adblock and Flashblock both work, Dictionary lookup doesn't yet.

I learnt on the plane today that

September 3rd, 2006

[thanks to a full page ad in Aftenposten] IBM stands for International Business Machines [Corporation]. Now I feel like a kid who's just been told there's no Santa Claus. It never actually occurred to me that IBM was an acronym, it just sounded like a cool name for the world's biggest computer company. Business machines? What the hell is that? It could be anything, from big lawn mowers to saw mills. It's probably the most abstract name I've heard for a company, literally doesn't say anything about what they intend to do (everything, apparently).

What's more, if you open wikipedia and you're looking for IBM without knowing the acronym expansion, you are presented with these choices, none of which seem especially plausible:

buying stock

September 3rd, 2006

How is that for a metaphor? Investing funds (ie. time) to obtain assets (ie. knowledge), whose worth is determined by the free market. Or if you cut the bs, simply deciding which languages to learn based on knowing its current value.

ruby_yukihiro_matsumoto.pngThe TIOBE Index is the place to be if you want respect and admiration. Surely something Yukihiro Matsumoto (on the right) is presently enjoying, as his creation, the Ruby language, is about to crack the Top 10. And we trust Japanese engineering, don't we? If you buy Japanese electronics, you know it's gonna be good.

TIOBE tells us that if there is one thing all those colleges and universities teaching Java can't be accused of, it's being business oriented. Java is the number one 'enterprise language' at the moment, and after 3-4 years of Java, most graduates have learnt enough about it to look for jobs in companies that don't use it.
So, looking at TIOBE in terms of owning stock, this is my break down.

01] Java -- well grounded, hope I never have to use it
02] C -- familiar, only done a little hacking
03] Visual Basic -- no knowledge
04] C++ -- have learnt it, never used it in a project
05] PHP -- quite familiar, a couple of projects
06] Perl -- very basic
07] Python -- very comfortable, favorite language
08] C# -- meaning to start a project in it, but haven't yet
09] Delphi -- the high school years, don't remember much anymore
10] JavaScript -- always been avoiding it, not a big fan of client side scripting
...
13] Ruby -- see C#
16] Lisp/Scheme -- see C#, but lost interest in it after I started learning Haskell
27] Awk -- barely touched
32] Bash -- intermediate skills
46] Haskell -- basic skills, will be seeing plenty more of it in school soon

fixing missing post slugs in wordpress

August 31st, 2006

If you've ever moved from one house to another, you know that it's not just moving day that is a mess in your new house, it drags on for a while until you get things sorted out. Lots of little details escape attention for days, weeks even. But eventually you track down every last one and after about a month or two, you are 100% in order.

Now you're probably thinking what the hell does that have to do with the title of this entry?!? Well, just like moving houses, migrating data from one system to the next is similar. And moving from BLOG:CMS to WordPress has not been entirely trivial, so I still spot the odd bug even though it's been a couple of weeks. One thing I neglected to consider when migrating the blog was missing post slugs. You see, WordPress uses post slugs as a way to label urls more human-friendly. Instead of {blog_url}?p=34 to open post number 34, it allows you to use urls in the form {blog_url}/index.php/year/month/day/blog-entry-title (the part after the last slash is what WordPress calls a post slug) This is nice for people who link to a blog entry, because the latter url makes a lot more sense to a human than the former (which is just a number of a column in a database).

But. BLOG:CMS does not use post slugs (or didn't), so I've never had them. WordPress generates them automatically for new posts, but since I imported my old entries into WordPress, those didn't have post slugs from before. I realized all this when I migrated my blog entries, and I thought it was just inconsistent, but it wouldn't have any repercussions. Well, it turns out some links were broken over this. So I realized today that I would have to fix this annoying little bug and put in post slugs for entries that don't already have them.

And for that purpose I wrote a little script. It's a quick and dirty fix, stripping off all non-ascii characters (this will not work well with non-English post titles), forcing all characters to lowercase and inserting hyphens between words. But for my money it works well enough.

<?

$dbhost = '';
$dbuser = '';
$dbpass = '';
$dbname = '';


$sql = 'SELECT ID, post_title'
        . ' FROM `wp_posts`'
        . ' WHERE post_status = \'publish\''
        . ' and post_name = \'\''
        . ' order by ID asc';


$db = mysql_connect($dbhost, $dbuser, $dbpass) or die('Could not connect: ' . mysql_error());
mysql_select_db($dbname);


$result = mysql_query($sql) or die('Query failed: ' . mysql_error());
while($row = mysql_fetch_array($result, MYSQL_ASSOC))
{
	$id = $row['ID'];
	$title = $row['post_title'];
	
	$title = trim($title);
	$title = strtolower($title);
	$title = str_replace(" ", "-", $title);	
	$title = ereg_replace("[^a-z0-9-]", "", $title);
	$title = ereg_replace("[-]+", "-", $title);
	
	echo "ID :{$row['ID']} <br>" .
		"post_title : {$row['post_title']} <br>" .
		"post_title : {$title} <br>";

	$sql_u = 'UPDATE `wp_posts` SET post_name = `' . $title .'`'
		.'WHERE ID = ' . $id;
	echo '<br>'.$sql_u;
	mysql_query($sql_u) or die('Query failed: ' . mysql_error());

} 

mysql_close($db);

?>

Project Newman :: An evaluation

August 29th, 2006

The thing about a project like Newman is that it's basically impossible to make it work perfectly. It has a difficult job, because there are so many potential sources of error. Servers may go offline, connections may fail, article formats may change and so on. It is as good as impossible to guarantee that Newman will do the right thing, because at the end of the day we are trying to analyze text, and computers are not good at doing that. Just look at spam filters - they have been improved upon for years, but everyone is still getting spam. Much less than before, of course, so the filters are definitely useful. And Newman too makes mistakes, but it does still succeed quite often.

Newman has been posting on Xtratime.org under the username Carsonne, a French female impersonator of Carson35's it would seem. :D Carsonne averages about 15 posts a day since July 30, that is a little over 350 posts in all, 350+ news stories posted. While I haven't been keeping score to present statistical numbers, I have kept a close eye on Carsonne and I would estimate that upwards of 90% of the stories posted were correctly parsed, formatted and classified. In fact, I recall about 10-15 misposts of the ones I've seen (which I think is most). And that is an error rate no human poster would have, Carsonne at an estimated 95% success rate is at least an order of magnitude below a human poster (ie. I would claim that a human poster would have a >99.5% success rate at copy/pasting and classifying stories - less than 2 misposts in 350).

What about user input, then? Well, unfortunately Newman does present a certain configuration cost, not everything can be automated. In particular, finding channels is something that would be wonderful to automate, given how quickly the forum climate changes. Newman also requires that sources be configured (and if need be - updated) for the parsing to work. Of course, once that is in place, Newman can post at will. So that is still quite a limited set of abilities.

The screenshot below shows a typical run of Newman. Quite a few stories were fetched, some were selected for posting, and then posted. It also shows how Newman is fault reliant - a parsing error was handled gracefully, as was a timeout from the forum web server.

newman_running.png

After 20+ days on the forum, Carsonne has been active long enough to stir up some reactions about "her" ;) posting of news. Carson's long tenure has paved the way for posters like this, so Carsonne is seen by most as just another compulsive news poster. "She" has taken some heat over posting news in the wrong place (wrong classification), but beyond that it has been no worse than Carson gets daily.

So what have we learnt?

As it often is, it seems that Project Newman has yielded more questions than the number of answers it has given. Sure enough, it isn't too hard to automate posting on a forum, it isn't too hard to fetch stories from the web and parse them, it certainly isn't hard to automate this out of any human's ability to keep up. But it is hard to decide what text means, it is hard to decide which story is relevant to what thread, it is hard to decide whether a word in a sentence is a name and so on.

The question is just how to do these things in a reliable way?

Thus endeth Project Newman. Download the code from the code page if you're interested.

This entry is part of the series Project Newman.