numerodix blog

how to spend a lovely weekend down south

August 28th, 2006

Sørlandet (literally "south of the country") is just the nicest place to be in the summer. Tourist flock from all of Europe to experience the Norwegian summer in this region. It's been years since I've spent summers there myself, but this weekend I had a chance to go back to Mandal, a lovely little town with the best beach in the country.

First, catch a flight down south to Stavanger (cause it's waaaay too far to drive). Then, get on the E39 direction Kristiansand. It's about 200km to Mandal, so set aside 3-4h for this, it's quite a scenic drive through the mountains and valleys, but you won't go fast cause the roads are narrow and there's traffic.

In Mandal, you will discover a small, charming touristy town with classic white wooden houses and a vacation kind of atmosphere. (Make sure you don't back into the car with the German plates as you're getting out of your parking spot.) Once in Mandal, you may want to go for a coffee (or indeed an ice cream) at the town's best located ice cream bar, right on the main street. Then, why not take a stroll through town and pick up some nice paella for dinner at the fish market.

Sjøsanden camping is right outside town. Bring your camping gear or hire a bungalow if you don't have any. Head for the greatest beach in Norway, which is literally at the foot of the camping (I tried pulling up the Google Earth imagery for it, but as luck would have it, the imagery stinks for that particular place).

Once you're done with the beach and dusk sets in, have some dinner and head to Kristiansand, a mere 40km away. Kristiansand is *the* city in the south of the country. Also a nice place to be in the summer, the old fort dominates the bay, but there are marinas and there's a beach too.

The next morning, go for a swim in the sea before breakfast. If the water feels chilly, then it probably means it's the hot time of year. On the drive back, stop off at the little town of Flekkefjord. It isn't so much of a tourist place, but it's more the kind of small town Norway that much of this country is. As you drive into the center, crossing the bridge over the fjord, on the left you'll see a place called Kaffebørsen. They have a wide selection of coffee and in particular, their mocca is excellent.

Posted in en, travel | 2 Comments »

Project Newman :: Further ideas not implemented

August 27th, 2006

Newman was meant to be a simple design that wouldn't take too long to build (it took me about 2 weeks of afternoons to write) and just focus on the issues that are simple to handle without making a complicated mess of it. There are all kinds of ways in which it could be improved and I'll mention some of the ideas that I decided to leave out.

Channel limits

Even in threads where it is deemed acceptable to post news articles, it isn't civil to post 20 articles a day in one thread. To overcome this problem, one could limit the number of stories that may be posted in one channel per day. This would require another cache, to keep track of how many stories have been posted in which channel already today.

This is something I thought I would do, but I haven't written it, because it doesn't seem to be a problem. Newman can post up to 5 stories in one thread (which is a bit much), but it will only do this in a couple of channels (the Real Madrid one tends to see a lot of news). So the amount of news posted in a channel reflects the amount of news about that club that day (and that seems quite fair).

A bandwidth monitor

Newman deals in receiving and sending data to web servers. This generates a fair bit of traffic. Web hosts tend to be somewhat sensitive about one client making lots of connections, because bandwidth isn't free. While Newman does not operate on a mass level and will not overload any server by itself, it may still be useful to know just how much bandwidth it generates.

I haven't looked deeply into this, so I'm not sure how to do it. Web servers send a Content-Length field in http headers, but this will only account for the traffic received. Perhaps a byte count of the uuencoded form input that Newman sends could be used to track outgoing traffic. But even this would not account for the low-level overhead in establishing socket connections.

User agent scrambling

Newman identifies itself as Firefox, so it looks like just another human client connecting to the web server. But the reporter retrieves a list of stories and then sequentially retrieves every story. This behavior is too systematic to be human (noone is interested in *every* story), and so a web admin who keeps track of logs, who sees that the user agent is always the same, will assume it's the same client making these connections.

To escape that kind of detection, we could easily scramble the user agent string and just make Newman report a different user agent for every connection. This would make it look like the connections are coming from a shared ip address, but from different clients (for instance different people in a school or company).

Client source scrambling

Closely related to the point above, a web admin that wants to block Newman can only do so by blocking the ip address the connections are coming from. This could be overcome by making Newman connect through anonymous proxies (or maybe use the Tor network?). And connect everytime through a different proxy - that would make blocking ip addresses a lot more difficult.

Running stories through online translator

This is a very silly idea, but sometimes people post stories from a non-English source, because the story talks about something that the English language sources haven't caught up with yet. Since Xtratime.org is an English speaking forum, most people don't understand these articles. So the person who posts it will sometimes post an automatically generated translation of the text, using the notoriously bad Altavista Babelfish.

And so it could be an extra feature for Newman to retrieve articles from gazzetta.it or Marca, translate them with Babelfish, and post them. This would further reduce the quality of the articles Newman is posting, but it certainly is something that human posters tend to do.

This entry is part of the series Project Newman.

Posted in en, newman | No Comments »

Project Newman :: The scheduler

August 26th, 2006

Now that we've covered the reporter, the editor and the publisher, we have a functional Newman that can actually post stories. I set up Newman to run in a cron job (ie. at set intervals) to run every three hours, but then it occurred to me that it isn't human behavior to post at 9am, then at 12am, then at 3pm and so on, it just doesn't look real. And if someone were to keep an eye on Newman, they might notice that it always posts at regular intervals, which looks odd. (The point here isn't so much to fool people into believing that Newman is real, it is just to make it so that it seems to exhibit a lot of human qualities.)

So I thought why not add a scheduler to decide when Newman should run. The scheduler runs as a daemon (ie. an application that runs 24/7 in the background, but only does actual work whenever it is called upon). So the scheduler is given a time interval (for instance: 3 hours), and then it generates a random number between 0 and 3 hours. That's when Newman is going to run. And then it goes to sleep until that time. So if I start the scheduler at 10am, give it an interval of three hours, it may decide that Newman should run at 11.45. So then it goes to sleep until 11.45 and then it runs Newman.

The advantage of this method is also that if the scheduler runs Newman and Newman crashes, it won't make the scheduler crash. So the scheduler will still keep running and will again run Newman at the next interval. I've also made sure that the scheduler waits for Newman to finish, so that if Newman is taking a lot time to complete and the next interval is in 5 minutes, Newman will not be started again until the current execution is finished.

This entry is part of the series Project Newman.

Posted in en, newman | No Comments »

Project Newman :: The publisher

August 25th, 2006

Compared to what we've talked about so far, the publisher is a pretty simple piece of the puzzle. It receives a list of stories, each one assigned to one or more channels, and simply posts them on the selected target, that is Xtratime.org. For this to work, we must first prepare an account on the forum for Newman. Having done that, the publisher will log in the user, open the thread where the story should be posted and simply post it, adding some vBcode formatting to the text. The image below shows what a typical news post looks like.

While Newman posts articles on Xtratime.org, which is a vBulletin forum, it could just as easily post them on any other website, it's just a matter of reading the html code sent to us from the server and submitting html forms with the correct data. I won't bore you with the details. Of course, there is always a chance that the server may drop the connection while the posting is in progress, or the connection may fail in the first place. In these cases, the publisher will report the error, but it will try to post the next story as if nothing happened.

In the event that vBulletin rejects the post for any reason, Newman tries to read this error and report it. It may be that two stories have been posted too close together (vBulletin has a limit for how often posts can occur by the same user), perhaps something else didn't go to plan. In any event, it handles these errors gracefully without crashing.

In early stages of development, I was testing Newman on a test forum I set up, which was protected with a password. Newman can use basic http authentication to access web sites in that way.

This entry is part of the series Project Newman.

Posted in en, newman | 1 Comments »

computer nostalgia (bringing format c: to linux)

August 23rd, 2006

As time goes by, there are certain things from the past that stick with us, aren't there? Things that won't quickly be forgotten. Just the other day I was thinking it's been a while since I've seen the good old format c: screen. I remember seeing that screen a lot back when I was a Windows user. All the way from Windows 3.1 to Windows XP, ever so often I would format and reinstall the system. And formatting was the simplest way to start with a clean slate (virus and spyware wise, in later years), it was much quicker than deleting all the files.

The format command also had this mythical quality about it. It was synonymous with destruction, with sabotage even. Whenever we joked about messing up someone's system, we would always joke about formatting c:. I don't recall ever actually doing that to someone for amusement, but it was certainly tempting at times (on school computers especially :D ).

But then I remember one time back in high school, years later, when a friend of mine threw a party for our class. Lots of people showed up that noone seemed to know, but his house was big enough to fit everyone in. A couple of days after the party, he was telling me that at around 1am, at a time when the party was well underway, he came into his room, found his computer was on and the format c: screen was staring him in the face, with the counter at 80%. He said he immediately cut the power. He then turned it back on, the system hadn't been wiped yet. What a relief.

So with this in mind, it occurred to me recently that it would be fun to recreate the mythical format c: screen, given that I never see it anymore. It took me a while to figure out how to print characters and then delete them in bash, but here is the code that recreates the actual format c: screen. It's shown in the screenshot above. The font isn't correct, unless you have your terminal running on the original Lucida Mono font that Ms DOS came with. But other than that, I've tried to recreate it to a T.

#!/bin/bash

if [ "$1" = "" ]; then
	echo "Required parameter missing -"
	exit 1
fi

drive=$(echo $1 | tr [:lower:] [:upper:])

sp="\0040"
bs="\0010"

spaces() {
	e=""
	for i in $(seq 1 $1); do
		e="${e}${sp}"
	done
	echo $e
}

el=$(spaces 50)

label1="\n\nWARNING: ALL DATA ON NON-REMOVABLE DISK
\nDRIVE $drive WILL BE LOST
\nProceed with Format (Y/N)?"
label2="\n\n
\nChecking existing disk format.
\nRecording current bad clusters"
proc1="Complete. $el
\nVerifying 1,023.71M"
proc2="Format complete. $el
\nWriting out file allocation table"
proc3="Complete. $el
\nCalculating free space (this may take several minutes)..."
proc4="Complete. $el
\n\nVolume label (11 characters, ENTER for none)?${sp}"
label3="\n
\n1,071,337,472 bytes total disk space
\n1,071,337,472 bytes available on disk
\n
\n$(spaces 8)4,096 bytes in each allocation unit.
\n$(spaces 6)261,556 allocation units available on disk.
\n\nVolume Serial Number is 1E36-1EF5\n\n\n"

type_delay=0.3
counter_delay_short=0.05
counter_delay_vshort=0.005
counter_delay_long=0.3
cmd_delay=1

pause() {
	sleep $cmd_delay
}

print() {
	for i in $(seq 0 ${#1}); do
		c=${1:$i:1}
		if [ "$c" = " " ]; then
			c=$sp
		fi
		echo -ne $c
		sleep $type_delay
	done
}

counter() {
	for i in $(seq 1 100); do 
		l="${sp}$i percent completed."
		echo -ne $l
		sleep $1

		for j in $(seq 0 ${#l}); do
			echo -en $bs
		done
	done
}


echo -en $label1
pause
print "y"

echo -e $label2
counter $counter_delay_short
echo -e $proc1
counter $counter_delay_long
echo -e $proc2
counter $counter_delay_short
echo -e $proc3
counter $counter_delay_vshort

echo -en $proc4
pause
print "l33t h4xx0r"

echo -en $label3

What it does is... absolutely nothing. Except simulating what happens when you type C:\>format c: [ENTER] in Ms DOS. To run it, download the file, chmod 755 format it, and copy it to a path that is in your $PATH, like /usr/local/bin with cp format /usr/local/bin. (you may have to use sudo here, /usr/local/bin is usually only writable by root). Now you have your very own format command on linux and you can run format c: whenever a bout of nostalgia hits you and you miss the old format command.

Best of all, it doesn't actually nuke your files, but you can still use it to scare the bejeezus out of people. ;) :devil: :D And since you just set its permissions to be executed by any user, any user can run it (perhaps with some persuasion? ;) :D ).

Posted in code, en | 2 Comments »

how to spend a lovely weekend down south

Project Newman :: Further ideas not implemented

Project Newman :: The scheduler

Project Newman :: The publisher

computer nostalgia (bringing format c: to linux)

Quick links

Google.me

Categories

Meta