numerodix blog

Archive for the ‘technology’ Category

new posts popup

September 15th, 2007

This is a feature I've wanted to have for a long time, but until now I didn't know how to realize it. I wanted to have some kind of a notification area for new events on the blog, so that a returning visitor could immediately see what has changed since the last visit. And I definitely didn't want it on the sidebar, it had to be above the fold.

So the concept was in the back of my head for months, but I couldn't figure out how to make it look good. Then I came up with the idea of making it a popup window. Not a browser window, of course, just a layer that would show if there had been new events. Otherwise it wouldn't show up. Yes, that sounds like something. So with some digging and research, a bit of hacking and lots of debugging, here is the final result.

The window conveys quite a lot of information. It lists the three posts last to be published (or commented on). This way you have new posts and new comments in the same place. In the screenshot, the top entry is a post made recently. The bottom two are older posts that have received new comments.

In terms of appearance, I wanted to make the window active only if the user is using it, so on page load it is made partially transparent, onMouseOver it becomes more opaque, and onMouseOut it becomes more transparent again.

For a demo.. you have this blog. After 15 minutes of inactivity your session will expire and the window will go away. To bring it back delete your cookies from this domain (or use a different browser) and it reappears. The session is handled entirely with cookies, so for visitors who don't accept cookies, the window will always appear as if this were their first visit.

Compatibility

The opacity property is new in CSS3 and isn't uniformly supported (yet). I've tested the plugin with the following browsers.

Firefox 1.0.1, 2.0.0.6
Opera ~~8.0~~, 9.23
Safari 3.0.3
IE 5.0, 6.0, 7.0
~~Konqueror 3.5.7~~ (opacity support is rumored to be on the way)
Netscape ~~6.0.1~~, ~~7.0~~, 8.0.2, 9.0b3

In addition, there's a rather pesky layout bug in IE <7.0 that causes the height of the window (which is floating above the other content) to be added to the top of the page. If you fix it, please send a patch. :)

Also, I tried very hard to make sure it only consumes one query, which unfortunately made it very complicated. If you rewrite it in simpler terms, send a patch. :)

Required MySQL version: 4.1+
How to use

Download, unzip, install, append the css to your styles. :cap:

New Posts Popup plugin @ wordpress.org
New Posts Popup @ svn

UPDATE: Added Netscape.

UPDATE2: MySQL compatibility.

Posted in en, wordpress | 17 Comments »

sshfs: easy to access remote ssh locations

September 13th, 2007

If you're a heavy ssh user, you already know about scp and rsync+ssh, but even that gets tedious when you're using the same remote location a lot.

A solution to this is KDE's fish:// kioslave, which lets you browse the remote path in konqueror much like you do any other. The drawback is that it's not an actual filesystem, so if you open a video, you'll have to wait before konqueror copies the whole file to a temporary local path before it will open it. (The same goes for smb:// samba shares, and probably all kioslaves.)

I've been using fish:// a lot, but lately it's become very flaky on me, and I don't know why, because the terse error messages don't explain anything. But I also miss how it's less convenient than nfs, which *is* a real filesystem (albeit one that is a pita to configure properly).

But there is another option. (Oh who am I kidding? This is linux, there are probably hundreds of options. :D ) If you have a remote location you need to access a lot, you could try sshfs. As the name implies, the protocol is still ssh (so the traffic is encrypted), but the interface is that of a filesystem. And it's based on fuse (the user level filesystem layer), so no messing with the kernel necessary.

Here's what you do

emerge sshfs-fuse
echo "fuse" >> /etc/modules.autoload.d/kernel-2.6
modprobe fuse

That should make sure the fuse module is loaded on boot. Now to mount and unmount a remote path:

sshfs host:/path /mount/point
fusermount -u /mount/point

Or, to make it even easier.. save this in /usr/local/bin.

MOUNT_POINT=/mount/point
HOST=host
MOUNT_PATH=/path/on/host

if [ ! -d $MOUNT_POINT ]; then
	echo "mount point $MOUNT_POINT missing"; exit 1
fi

if mount | grep $MOUNT_POINT; then
	echo "umounting..."
	fusermount -u $MOUNT_POINT
else
	echo "mounting..."
	sshfs -C -o transform_symlinks -o Cipher="blowfish" $HOST:$MOUNT_PATH $MOUNT_POINT
fi

Posted in en, technology | 2 Comments »

recover lost stuff from memory

September 10th, 2007

This has happened to you before. I'm painstakingly typing a long email on gmail and I'm not sure that I should send it yet, cause it feels like I'm forgetting to mention something. So I want to save it as a draft so I can finish it later. Somehow I hit Discard instead. :doh: Gmail flashes the notice your message has been discarded, but I don't usually read those messages, so I navigate away from the page, and *just* as I click the link the meaning of the message dawns on me. Shit. Now it's too late to undo the action. Son of a. :fero: :wallbang:

Okay, relax, perhaps all is not lost. A couple of weeks ago I went over how you can find stuff on disk by searching the raw data. The same *can* be done with memory. See, just because my message is gone and gmail doesn't display it anymore doesn't mean it's not still possibly somewhere in memory. It just isn't being displayed anywhere.

There are two ways to access physical memory. The two interfaces are /dev/mem and /proc/kcore. As root, you can read from these. (However, if you try writing to them you'll probably mess up your system.) They are not identical, and it seems that /dev/mem doesn't let me access memory above 896MB (High Memory Support in linux kernel parlance), so just use /proc/kcore.

To find that lost message in raw memory, it helps if you can remember a phrase from it. Then do

cat /proc/kcore | grep -a --color -C1 "a phrase from it"

This will search the memory treating it like text, and highlight the phrase when it's found. It also prints "one line" above and below the line where the text was found (although considering this is binary data, the notion of "a line" is somewhat diffuse). Anyway, you probably now have enough context to get your whole message. If not, increase it to -C2 and so on.

This way I was able to recover my message. :party:

In principle, you can also recover lost files this way, provided they are still in memory, but searching for binary data within binary data is a bit trickier, so it would take a clever approach.

Posted in en, technology | 10 Comments »

bad ui on display in dia

September 7th, 2007

Dia is a really useful application. Perhaps there is some better one out there, but it's the best app I've seen for drawing diagrams. When I need to draw a diagram for a technical paper or a presentation, dia is essential.

Having said that, it has some really bad interface problems. Not that ui is any kind of expertise of mine, to me it's just common sense and if something gets in my way I think it's badly designed. For that matter, I have read quite a few criticisms of bad ui, but never one that strived to be complete, to give a full review of the application. It seems that ui critique is really about pointing out one or two bad bits. And that's what I'm doing here as well. So obviously this doesn't mean the whole application is useless and everything is wrong.

MDI/SDI

Some people have really strong feelings about this issue. Personally I think it has to be settled on what is best for the application in question. Firefox is Single Document Interface, ie. you have multiple windows. Opera is Multiple Document Interface, where you have one main window and more windows inside of it ("multiple" refers to these sub-windows). To me there is no question that Firefox is much better off for this. Everything you need to do in Firefox is constrained to the one window, you don't need multiple windows visible unless you're doing some kind of copy/paste activity.

But editor apps have other needs. Photoshop is SDI (as are most image editors), the gimp is famously (and painfully) MDI. Dia copies this bad choice. I suppose the argument is that when you have your canvas window separate, you can maximize it and work on your document full screen. However, unlike Firefox, you need a lot of tools to do this, so unless you've memorized keyboard shortcuts to select them, you have to bring the palette, layers and other windows to the front anyway. This is a huge pain when you don't *dedicate* your workspace to editing, but you also have half a dozen other applications open.

No menubar in the canvas window

This is my biggest gripe with dia. For better or for worse, this is the kind of diagrams I draw in dia (below). I rarely use the in built stencils, because they all assume some specific kind of diagram other than what I need.

As it happens, one of the more useful functions in dia are the layers, when dealing with more complicated diagrams. To bring up the layer window, I have to right click on the canvas to get the main menu first. Why this menu isn't fixed at the top perplexes me (apparently it's possible to change this, but defaults are much more important than configuration options). Well, you might think what's the difference, either way it's just one click away. The difference is that when it's a fixed menu, it's always in the same place, it makes it easier to use, you locate items quicker visually.

A lot of useful things are in the main menu. Like alignment of objects. This is found in the Objects > Align submenu. Needless to say this is quite a pain to invoke more than a couple of times. This should probably be made into a palette window.

One thing I really like about dia is the number of different formats it can output. Most of my diagrams are pngs. This is called Export in dia. But to export my diagram (rather than save it in dia's own format), I need to choose File > Export from the menu. There is no keyboard shortcut for this action. If I'm tweaking my diagram to see if it looks good in a report, I have to do this export ritual several times. Awful.

Other quirks

And do you see that zoom control in the lower left corner? I can't change the zoom level with my mousewheel (like in the gimp). Bad.

In the above screenshot, if I wanted to place some object above the rectangle, a distance greater than what I see in the canvas, I have to scroll up. Except that the scrollbar doesn't seem to allow this, it seems to indicate that the canvas can't be larger than this. The mousewheel will actually scroll up, which is inconsistent with the scrollbar.

Posted in en, ui | No Comments »

desktop hackery with grep

August 15th, 2007

Just as every self respecting Unix user knows (and every Mac user should know, but probably doesn't), grep is the tool supreme for finding stuff in text files.

Here I describe harvest, a similar tool to grep for searching in all kinds of files (and not).

Why on earth?
The original problem was rather contrived, admittedly. I had been resisting the facebook bandwagon for the longest time, but finally a friend talked me into trying it. If you know facebook, you how the system works with adding "friends". And it's rather nice in how it imports your contacts from gmail and such. Of course, when you have your contacts elsewhere, you're left with a chunk of manual labor, searching&adding one-by-one. Not that it's a big problem, just a one time thing after all. But I was reluctant to undertake it, so I was clicking through facebook instead and found an import contacts from file feature, hm now that sounds more like it.

So I thought to myself in my university email account, I have a year worth of email history.. wouldn't it be satisfying to scan the whole thing and produce a list of email addresses I can import straight into facebook? Yes, I'm quite aware that my train of thought is somewhat off the beaten path much of the time. :D

In case you didn't know, email is just text. You may get a different idea when your email reader hides all the technical bits and just shows you the body of the message, but a stack of messages is just a bunch of text files, so there's no reason you can't treat them as any other text. So I went along and downloaded the whole thing, some 30mb. Now for the fun part. :cool:

So I need a tool that will scan the huge chunk of text and extract all the email addresses, and print them out in csv format. Preferably also remove duplicates from the list. Since I'm riding the ruby wave these days, it was the obvious choice, not least because I like how it handles regular expressions natively. So I hack up a script to do this, calling it harvest. It gets the data from the standard input, scans it for matches, and spits out the email addresses, very simple. And it works like a charm on my huge hunk of email data.

At this point you'll be wondering why on earth not use grep? Because to my knowledge grep only matches line-by-line, whereas I wanted something more general than that. And of course it's also the case that once you actually code it up, you have all the freedom you could ask for, rather than being limited to what grep does and doesn't do.

Can you make this thing go any faster?
Later on I realized that I can run harvest on just any file, and it would still work. Not that I had just discovered a new continent, strings already extracts all text strings from any file, including binaries. But the difference was I could search for things. So I found a nice test subject, pagefile.sys, which is Windows's swap file. :D I boot Windows once every few months, and when I do I rarely remember what I was doing last time. But apparently I had decided at some point that the swap file should be 1.5gb.

So I run harvest on it, while keeping an eye on things in htop. And ouch, harvest is consuming the entire file and reading it into memory. Next it's going to search for email addresses in a 1.5gb long string. :D Needless to say, that wasn't a success, the system started choking as it ran out of memory.

So I thought it would be better to buffer the file and read a chunk at a time. The only question is how do I still match for strings in a file of which there is only a chunk available? I wasn't exactly planning on matching super long strings, but then again there is the case where a string you want to find is part in the chunk currently in memory, and part in the next one, so how do you make sure you don't miss it? I tried an algorithm, and it was no good. It turns out for a long time I was barking up the wrong tree, and as a result I rewrote it about five times until I got it right. It is uncanny how the best solution is usually the simplest also.

To make sure it runs at an acceptable speed, I also experimented with the buffer size vis a vis speed and memory use, finding that a small buffer is actually better. When hunting for performance problems, it's often a good idea to run your app through a profiler just to be sure that it does what you think it will.

With a 10kb buffer (79s):

%   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 10.43     8.27      8.27   156723     0.05     0.05  Regexp#match
  8.73    15.19      6.92   618095     0.01     0.01  String#length
  6.87    20.64      5.45   153806     0.04     0.04  IO#read
  4.45    24.17      3.53   153810     0.02     0.02  String#+

This surprised me. I thought it should be spending far more time matching than say reading from disk. So I tried with a bigger buffer to see if I could marginalize disk io in the overall cost.

With a 10mb buffer (115s):

%   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 94.46   108.96    108.96     3057    35.64    35.64  Regexp#match
  2.52   111.87      2.91      152    19.14    19.14  IO#read
  1.31   113.38      1.51      156     9.68     9.68  String#+
  0.28   113.70      0.32       75     4.27    33.47  Kernel.require

This is more like what I expected, now almost all the time is spent matching. But it actually takes longer (and uses more memory as well, obviously), so there's nothing to gain by increasing the buffer unless the string we're searching for is so long that we need a buffer of megabytes. (Obviously, emails and urls are much shorter than that.)

To profile your script in ruby, try:

ruby -rprofile ninja.rb

The script now runs pretty fast, scanning the Windows pagefile in a couple of minutes, which I'm quite satisfied with.

More fun than a bag of chips, but useful?
the famous sliced bread I'm sure you're still wondering if the facebook scheme was a success. It wasn't. :D It turns out that out of all the emails harvested, a single one was found on facebook. As popular as the site is in Norway, apparently it doesn't have any Dutch users. :confused:

But since I already had harvest, I thought I would add an option to find urls as well, just for the heck of it. I also discovered that I could run it on any kind of file, not just text files. For instance, if you visited some cool site and forgot to bookmark it, it's probably still in Firefox's history file, so you can do:

harvest.rb --dat < ~/.mozilla/firefox/<profile>/history.dat

And not just files, either. To take a rather unexpected use case.. say you had an important email address, like for a job interview at the chocolate tasting lab, and you lost it.. well maybe it was swapped out at some point. Use harvest to scan your swap for email addresses:

cat /dev/hdXY | harvest.rb --email

And you can run that on any filesystem actually. :cool: I don't know how to access live memory in the same way, but that would be fun to try also. :cap: Things like zip files won't work, of course, because the text is scrambled, but otherwise (most of the time) you can read text out of any file whether it's a text file or not.

So is it actually useful? Not really. :D

But the useful observation is that your data is right there, and though you may not see it directly, it doesn't take more than this to actually look through it.

#!/usr/bin/env ruby
#
# Author: Martin Matusiak <numerodix@gmail.com>
# Licensed under the GNU Public License, version 3.
#
# revision 3 - allow spaces in urls
# revision 2 - introduce buffering to handle large files out of memory
# revision 1 - performance hacking: output entries immediately, only sort on
# emailcsv


require "optparse"


email = /([a-zA-Z0-9_\.-])+@(([a-zA-Z0-9-])+\.)+([a-zA-Z0-9]{2,4})+/m
url_orig = /([A-Za-z][A-Za-z0-9+.-]{1,120}:[A-Za-z0-9\/](([A-Za-z0-9$_.+!*,;\/?:@&~=-])|%[A-Fa-f0-9]{2}){1,333}(#([a-zA-Z0-9][a-zA-Z0-9$_.+!*,;\/?:@&~=%-]{0,1000}))?)/m
url = /([A-Za-z][A-Za-z0-9+.-]{1,120}:\/\/(([A-Za-z0-9$_.+!*,;\/?:@&~(){}\[\]=-])|%[A-Fa-f0-9]{2}){1,333}(#([a-zA-Z0-9][a-zA-Z0-9 $_.+!*,;\/?:@&~(){}\[\]=%-]{0,1000}))?)/m

pattern=url
joinlines=false
emailcsv=false
buffer_size=10*1024
hardlimit=100


## parse options
OptionParser.new do |opts|
	opts.on("--url", "url format") do |v|
		pattern = url
	end
	opts.on("--dat", "firefox history.dat format = \\\\n in urls") do |v|
		joinlines = true
	end
	opts.on("--email", "email format") do |v|
		pattern = email
	end
	opts.on("--emailcsv", "csv output (facebook contact import)") do |v|
		pattern = email
		emailcsv = true
	end
end.parse!


entries = []
previous = ""
while string = previous + STDIN.read(buffer_size).to_s and string.length > previous.length do
	partial = ""
	joinlines and string.gsub!(/\\\n/, "")
	while string and m = pattern.match(string) and m.size > 0 do
		m.end(0) == string.length and partial = m.to_s
		if partial.empty?
			if emailcsv
				entries << m.to_s
			else
				puts m.to_s
			end
		end
		pos = m.end(0)
		string = string[pos..-1]
	end
	if !partial.empty?
		previous = partial
	else
		if hardlimit < string.length
			previous = string[string.length-hardlimit..-1]
		else
			previous = string
		end
	end
end

# special stuff for csv email output
if !entries.empty?
	entries = entries.sort{ |a, b| a.downcase <=> b.downcase }.uniq
	puts '"Email Address","Formatted Name"'
	entries.each { |i| puts '"' + i + '",""' }
end

Posted in code, en | 3 Comments »