cutting the fat off binaries

September 20th, 2006

What's amazing to me is that for every technical problem, there's already been lots of people who've thought about it and tried to solve it. Given the world population is at some 6 billion, that's not really surprising, nevertheless it's very satisfying. Like lately I've been having thoughts about writing a small application using the Qt library. I haven't even begun designing it, I've just been doing preliminary research. My concern is that the program should only be a single binary and it should be as small as possible. The reason for this aim is that I want to make it as accessible as possible - it should require only a small download, and no installation necessary. So if it's a single binary, that's the easiest way to accomplish this.

But, of course, using any library at all already adds filesize in the shape of dependencies. Since I only want a single binary, I'm looking to compile statically, which will include all the library code that I'm using. Qt in particular, is a huge library. It probably adds up to about 15mb of library objects, and I don't want all of that in my "little" binary.

Let's do this by example. A few years ago I was dealing with High Dynamic Range (HDR) images, I even wrote a tutorial on how to produce them starting with pictures taken with a digital camera. I used Greg Ward's hdrgen utility for this. Greg's program is a single static binary. Now, if Greg had the same goal as I do, there are a couple of things he could have done.

$ ls -lh hdrgen
-rwxr-xr-x 1 alex users 8.7M Oct 24 2003 hdrgen

So the file is over 8mb in size, is there any way we can shrink it?

$ file hdrgen
hdrgen: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.0.0, statically linked, for GNU/Linux 2.0.0, not stripped

The information we're looking for is shown here in emphasis. The file is not stripped, which means it contains a bunch of symbols that aren't strictly necessary for it to run. Symbols that ease debugging or relocating the binary. Also note that the binary is statically linked, which means it does not depend on any libraries to run.

The first thing we can do it strip the binary. Stripping removes symbols and leaves only the bare essentials.

$ strip -s hdrgen
$ file hdrgen
hdrgen: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.0.0, statically linked, for GNU/Linux 2.0.0, stripped
$ ls -lh hdrgen
-rwxr-xr-x 1 alex users 1.9M Sep 19 22:59 hdrgen

As expected, the binary is now stripped. Notice also that the filesize has been reduced from 8.7mb to just 1.9mb! That's pretty sweet.

But it doesn't end here. A further way to reduce filesize (for static binaries only!!), is to compress them. UPX is a way to compress binaries to reduce their size further. It is a lossless compression method (otherwise it would be useless, of course), which bundles the compressed binary, and everything needed to uncompress it, in a single file.

$ upx -9 hdrgen
Ultimate Packer for eXecutables
Copyright (C) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004
UPX 1.25 Markus F.X.J. Oberhumer & Laszlo Molnar Jun 29th 2004
File size Ratio Format Name
-------------------- ------ ----------- -----------
1889480 -> 694035 36.73% linux/386 hdrgen
Packed 1 file.
$ file hdrgen
hdrgen: ELF 32-bit LSB executable, Intel 80386, version 1, statically linked, corrupted section header size
$ ls -lh hdrgen
-rwxr-xr-x 1 alex users 678K Sep 19 22:59 hdrgen

The binary is now compressed. As you can see, the file utility is having some problems understanding what it is, because of the added compression. But the binary is definitely smaller, down to just 678kb!

:: random entries in this category ::

1 Responses to "cutting the fat off binaries"

  1. Lucas says:

    15mb of objects is what you "pay" for when you use a mature and sophisticated library like QT. It's more than a toolkit.

    You could distribute dynamic binaries as well as static ones, as many people have QT installed.... (But C++ is a ABI minefield) =/