Archive for the ‘technology’ Category

the "print version" - a misnomer

May 24th, 2007

There was a time in the earlier web when you could open a web page, read it, and print it. Pages were mostly text, with some markup, and sometimes illustrations. With the popular use of frames, this convenience was lost immediately, as a content frame inside a frameset wasn't printable unless you did something voodoo-like as opening it in a new window. (And even then, the "fancier" sites had javascript to detect this and would restore the frameset for you, how helpful.)

Granted, today frames aren't popular, they went out of style with the Spice Girls. But, over the years, with the adoption of ever more useful sophisticated technologies, content on the web was rendered generally unprintable. Even so, we still wanted to print content, occasionally. In what is an instructive case of solving a problem by introducing a new problem and having to solve both, the answer was the print friendly version.

As it turned out, the print version came to be the readable version. With the web boom, and the consequent dot com burst, ad revenue went from easy-to-get to omg-we're-going-under. To stay afloat, sites grew ever more ad infested. What used to be a matter of principle (okay one banner at the top, but nothing in the content area), were scruples melting away to nothing. Even a few years ago, it was possible to open a website, ignore whatever was on the top, left and right, and read the content in the middle undisturbed. Not so now, lots of sites have both banner/flash ads right in the text, as well as the horrendous ad words, words/phrases in the text highlighted with a popup window when you mouse hover. Another popular gimmick among publishers is to divide up the text on several pages (under the pretext that the content is just too long, you don't want to load all of that at once, you'd rather load our ads), so you have to page your way through while reading and load new ads on every page. If you try to print this, you get snippets of the story with the same header and footer you don't care a toddle about (and all the ads included).

All of this basically makes quite a few sites unusable, at least without Adblock and Flashblock. Take Extremetech, which integrates all these awful gimmicks. They sometimes have good in-depth content, but would any human being want to read this article? Not only is the site ad infested like the worst of them, it also divides the story up into 11 pages. :lazy: Salvation? The print version. The print version is actually continuous, you can read it without having to click yourself through. And it has less ads in it. As it is, the print version is actually the only readable version many sites offer. It's not for print, it's for reading. It's become such a reflex that I always look for the print icon, and if it's not there I bid farewell.

In contrast, this makes blogs wonderful to read. Blogs with a nice design, little to no ads and good typography (and there are lots of these) are a delight, because they don't have all these things that prevent you from reading the content. Blogs are much more fun to read that papers and magazines.

open source and lacking communication channels

May 22nd, 2007

Open source projects understandably face a lot of challenges. One of those challenges is to communicate effectively with the user base. It is my claim that many of the common channels of communication are ineffective.

If you're a developer, put yourself in a user's place. Users have questions. Sometimes they have requests. Some projects go the extra mile to establish a channel for quick, uncommitted communication. But most projects don't do this, they rely on the classical communication channels.

Mailing lists are the way most projects exchange ideas between developers. They are archived, which makes them good for keeping track of what has been covered. But in order to post a single question to a mailing list, you first have to subscribe to it. Think about this for a minute. You're not a developer, you just want to ask one question, but in order to do that, you have to commit yourself to receive all the emails exchanged on the list. I have faced this problem countless times. Currently I'm on the cmake list, because I needed to ask a question about a month ago and I thought why not just remain there in case I have something else to ask in the near future.

This is completely pointless. Even if I just want to ask one question, I have to subscribe to the list, validate my email address, sometimes even await moderator approval (for some reason or another, joining a mailing lists always seems to take half a day between subscribing and being subscribed), post the question, wait for the answer, receive all kinds of irrelevant emails, see if I'm satisfied or I ask a follow-up, get more irrelevant emails, then finally unsubscribe. In the past couple of years, I've been through the subscribe-unsubscribe loop at least 20 times, usually with no lasting interest in the project. Not that it's a big problem, my gmail account is basically unlimited storage, and with the filters I don't have to look at these emails at all (which I don't). But cmake's mail server still keeps sending them even though I don't want them. It's a waste.

Naturally, there are reasons why it works like this. Lists have to protect themselves from spammers and trolls, but it doesn't make the user role any more fun. And even though mail archives are nice for developers to keep track of their ideas, more often than not they suck for finding information. A lot of mail archives don't have a search function. Of course, that begs the question why even have an archive then? In one or two cases I actually had to download all the monthly archives and grep them locally. And that's when the archive actually was downloadable.

Another common channel is irc. Great for irc people, less amazing for the rest of us. Irc demands that you find yourself at the channel at same time that the person who can answer your question is also there. If not, you can hang around all day, and pay attention to the channel, so that you don't miss it when the person is there and can help you. Some people probably have that kind of patience, but I don't.

Again, irc can be archived, but it's a very noisy medium. In between actual technical questions, there tends to be a lot of chatter. Nothing wrong with that, but it makes it less than ideal for effective communication.

An increasingly popular channel is the forum. Forums work really well, if they get enough traffic. Unlike irc, they aren't real time so it doesn't matter when you're there. And they are far more organized than mailing lists too. Again unlike mailing list, the sign-up procedure tends to be speedy. And again unlike mailing lists, it's non-committed: you can post a question today, get the answer, and never visit the website again. Your account remains, but there's no stream of emails coming after you. And forums are searchable.

Forums have their own problems. If they aren't sufficiently active to get your questions answered, they are useless. And they require more moderation too.

A non-interactive, yet still highly effective channel is the wiki. Wikis are fantastic for ad hoc, unofficial information. And their informal, editing-oriented feel encourage updates. Superb for all kinds of tips and frequently asked questions.

But, unless unrestricted (and thus open to spam), not useful for informal interaction.

This isn't to say that existing, well established channels are obsolete. Gmane certainly gives you more out of mailing lists, with a threaded view and searching. But the fact remains that many of the tools in use today are not optimal, and in some cases, very ineffective.

I didn't mention bug trackers, because they are quite effective at what they set out to do, and because they aren't quite general purpose either.

painless website backup/synchronization

May 18th, 2007

Why you should care

There are quite a few reasons why you would want to back-up your website. For one thing, in the case of some kind of security breach, you don't want to lose the files on the server. Even if someone broke in, with a backup you could just restore it and you'd be back in a jiff. Otherwise, maybe you just want full control of your files, and knowing that they sit on a server somewhere remote doesn't make you feel as good as knowing they are right on your local disk. Whatever the reason, the following method is well suited to Wordpress sites, but general enough to apply to just about any website.

However, the following method enables you to transfer files in both direction, it's equally ideal for deployment. It makes no difference if you're uploading or downloading, we cover both bases.

How it works

Okay, that was the sales pitch. The script was written to allow for fast deployment of files on a server. Using Wordpress as an example, if you're hacking on your theme and you want to upload that one file you changed and see the result, you can do that quickly and painlessly with rsync. It's really the best way to transfer one file when you know none of the other files have changed. rsync synchronizes two locations, transferring only what has changed.

The files are transferred with rsync over ssh, so you need shell access on the server for this.

In a typical example where you have an account on a web server, this is how your file structure is at the root level (your homedir):

$ ls ~
.bashrc
.htaccess
.ssh
bin/
etc/
mail/
public_ftp/
public_html/
=> cgi-bin/
=> images/
=> => picture.jpg
=> index.html

tmp/

The files in bold are the ones you want to synchronize with your local disk and keep up-to-date. But there will generally be a lot of other files you're not interested in, generated in your homedir automatically, like raw web traffic logs, mail spam etc. (If the item is a directory, you want all the files and dirs it contains to be synchronized.)

So the issue is to selectively pick the items you want. But there may also be certain types of files inside these dirs you don't want, like for instance I ignore cgi-bin. So you want a way to exclude certain files/dirs from being transferred.

How to

Now that you know what's happening, it's time to set it up. You fill in the variables at the top of the script. local_path is where you want the files on disk. remote_path is where they are located on the server (in most cases ~ or /home/username). locations is the list of top level directories/files you want to synchronize. And finally exclusions are patterns you want to exclude (so if it contains cgi-bin, then that directory and all the files in it will be excluded from the synchronization).

Once that's done, you just run

$ sync.sh down

to download the files on the server to your local dir, and

$ sync.sh up

to transfer your local changes to the website. Finally,

$ sync.sh

alone will log you into your server with ssh.

Time to synchronize full local/remote tree for matusiak.eu (5470 files) when no changes were made: 4.4 seconds. ;)

A small note about security

Note that this script does not violate or subvert how you access your server. It uses ssh as the underlying security context. You can easily synchronize up/down with public key authentication, in which case you'll never have to type in your password when running sync.sh, and it's actually more secure as well. :)

#!/bin/bash
#
# Author: Martin Matusiak <numerodix@gmail.com>
# Licensed under the GNU Public License, version 2.


# server setup
hostname="matusiak.eu"
username=""
ssh_port="22"

# local setup
local_path="/local/path"

# remote setup
remote_path="~"
locations="bin backups public_html"

exclusions="cgi-bin *.swp *~" #.swp are vim swap files


## EDIT BELOW THIS LINE IF YOU KNOW WHAT YOU'RE DOING

# rsync options
rsync_options="--archive --verbose --stats --progress"

# switch priority
nice="nice -n 10"


inc_list=""
function inclusion_list() {
	for i in $exclusions; do
		inc_list="${inc_list}--filter='- $i' "
	done
	for i in $locations; do
		inc_list="${inc_list}--filter='+ /$i' "
	done
	inc_list="${inc_list} --filter='- /*'"
}

function shell() {
	 ssh -C ${username}@${hostname} -p ${ssh_port}
}

function sync_up() {
	inclusion_list
	cmd="${nice} rsync ${rsync_options} -e \"ssh -p ${ssh_port}\" \
	${inc_list} \
	${local_path}/* \
	${username}@${hostname}:${remote_path} "
	echo "$cmd"
	sh -c "$cmd"
}

function sync_down() {
	inclusion_list
	mkdir -p ${local_path}
	cmd="${nice} rsync ${rsync_options} -e \"ssh -p ${ssh_port}\" \
	${inc_list} \
	${username}@${hostname}:${remote_path}/* \
	${local_path} "
	echo "$cmd"
	sh -c "$cmd"
}


if [ -z "$1" ]; then
	shell
elif [ "$1" = "down" ]; then
	sync_down
elif [ "$1" = "up" ]; then
	sync_up
else
	echo "$0 [down|up]"	
fi

latex: adding pagebreaks at sections

May 11th, 2007

Stephen Wright once said something to the effect:

I have a huge collection of sea shells. It's spread out on all the beaches of the world.

That's an exact description on the state of latex documentation. Sure, here's probably the most powerful typesetting language known to man, well probably just the one man who actually knows it, the rest of us know bits and pieces. But, when you actually need to do something that you haven't done before, or you've done but you can't remember, bon voyage.

Safe trip on that extensive google search, finding ancient web pages describing good old techniques (latex hasn't changed much over the years decades), 404 links to packages that once were in use, and a great deal of tips & tricks that seem useful, but are nothing like what you need to do right now.

Sometimes you'll find the answer. Sometimes you'll give up. Sometimes you'll conclude it's not possible (or at least, not unless you're a latex wizard). In general, it is possible. But because latex is used and abused by so many in so many different ways, over so many years, it's naturally hard to keep track of who accomplished what and how.

But, there is no centralized documentation at all. Latex is so huge that it needs to be extensively documented, but what you find instead is some professor who wrote a tutorial for his students for that particular assignment, or a list of all symbols you can use, or all kinds of bits and pieces, but nowhere can you find the whole. Not how the different programs are related to each other, how to write a fairly general Makefile for them, how to actually construct a workflow out of it. For that you better hope there is someone willing to guide you through it in the beginning.

One of the things I've wanted to do for some time is enforce a pagebreak before every section, because in some cases it just makes sense. Thrilled that I am that today I stumbled upon one of those ancient pages that has a working recipe for it. When you look at the solution, it's ridiculously simple, but when you don't know it... well.

\NeedsTeXFormat{LaTeX2e}
\ProvidesPackage{pagedsections}[2007/05/11 Adding pagebreaks before sections]

\let\oldsection = \section
\renewcommand{\section}[1]{
	\pagebreak
	\oldsection{#1}
}

Then, of course, include it into the document as usual:

\documentclass[12pt]{article}

\usepackage{pagedsections}

\begin{document}
\section{first}
blahdeeblah
\section{second}
blah
\end{document}

This lack of documentation is common for applications that predate the age of the internet, or at least the "modern" internet, not including usenet and whatever other deprecated forms of communication. For instance, bash suffers from an acute lack of in-depth documentation.

the state of RAW support in linux

May 11th, 2007

This only affects you if you have some source of RAW images, typically a camera would be that source. Then the RAW images need to be post-processed (which of course is something that's already done if you extract JPG's instead of RAW images from the camera) and converted to a target format, like JPG.

Viewers/browsers

The best one I know so far is showfoto, a component of digikam. digikam itself is fussy about images having to be part of albums, but showfoto has an adequate image browser with exif data display and some statistics about the image. It's also worth noting that digikam itself has been given a lot of attention, and has recently developed into a much better and more useful program than it was a few years ago.

Rawstudio also has a rudimentary image browser.

Converters

For this I would advocate ufraw. It's a standalone program, but it's also a plugin for the gimp. The interface is straightforward and quite handy.

showfoto/digikam also has features for conversion, but they are somehow tucked away in the menus and harder to find.

Rawstudio aims to be the tool of choice for this, but for the moment is seems rather immature and the interface could use work.

I think I read somewhere that Krita is supposed to convert its inner colorspace to be 16bit, which would make editing RAW images native, without needing to convert them first. That would be awesome. For the time being, I can't say anything for Krita, because it crashes the moment I start it (probably a bug in the koffice ebuilds).

Status

So the support for RAW images is quite encouraging. Not as nice as in Photoshop CS3, and this applies principally to the conversion options and the types of adjustments that can be made, but decent all the same.