Archive for the ‘english’ Category

pretty printing for everyone!

April 25th, 2010

I've been toying with the idea of trying my hand a generic pretty printer module for a while. Lately I've had to deal with cyclic object graphs and things like that, where having a dump of the data is pretty handy. Granted there is a pprint module in the standard library. But what it does is format and print iterables (lists, dicts, tuples..), it doesn't attempt to show you the contents of an object. Of course, when you're messing with objects this is very useful to have.

So I thought that I would build a recursive iterable that I can give to pprint. Here's an example:

class Node(object):
    classatt = 'hidden'
    def __init__(self, name):
        self.name = name

a, b, c, d = Node('A'), Node('B'), Node('C'), Node('D')
a.refs = [b, d]
b.refs = [c]
c.refs = [a]
d.refs = [c]

This will give you:

{'__type__': '<Node {id0}>',
 'name': "'A'",
 'refs': [{'__type__': '<Node {id1}>',
           'name': "'B'",
           'refs': [{'__type__': '<Node {id2}>',
                     'name': "'C'",
                     'refs': ['dup <Node {id0}>']}]},
          {'__type__': '<Node {id3}>',
           'name': "'D'",
           'refs': [{'__type__': '<Node {id2}>',
                     'name': "'C'",
                     'refs': ['dup <Node {id0}>']}]}]}

There are two things being shown here:

  • node C is reachable through ABC and ADC.
  • A takes part in two cycles: ABCA and ADCA.

It would be nice to have a way to see this from the output. So aside from the object attributes themselves there is also a __type__ attribute which tells you the type that you're looking at. And it has a marker of the form {id1}, where id1 is an identifier for this object, so that you can see where it pops up in a different part of the graph.

Now, suppose we follow A to B to C and then to A. We are now seeing A for the second time. Instead of printing the object again we print a duplicate marker: dup <Node {id0}>. The identifier is supposed to be vim * friendly, so if you pipe the output to vim, put the cursor over it and hit * (also might want to do set hlsearch) then you'll see it light up all the other instances of it in the graph.

pretty_printing_gvim

Well, that's all for now. It's definitely not the last word in pretty printing, but it's useful already.

I thought maybe github's gists would be appropriate for something like this:

lessons from "Coders at work"

April 16th, 2010

I already mentioned Coders at work in an earlier entry. The point of this one is not to write a review, but to make a note for myself of what I've gotten out of the book. I think I could do better to read more books with a pen and a pad so I have a better chance of exploiting the content.

So these are notes to myself. I wouldn't take it upon myself to summarize a more general listing of notes that would somehow apply to the average person, because I think we're all in very different places in the universe that is called "learning to program (well)", and every person has to figure out for himself what he most needs to learn relative to where he now is.

Advice: Read code

Read other people's code, "open black boxes". This is something I never really do, I should start. Just take some codebase and check it out, get used to the practice. Reading code is not the easiest thing to get into, so here are some tips:

  1. First, get it to build.
    Sometimes everything you have to do to build it already teaches you a number of things about the codebase. And once you have it built, you can start making changes to it and try out little things dynamically.
  2. Read while building.
    Making builds for any codebase can be hairy and painful, so parallelize this activity with code reading. Great way to use the time you'd otherwise waste in between debugging the build.

Advice: Write unit tests for new library

You've found a library for something that you've never used before: how do you figure out how to use it? Write unit tests. Some libraries have bad unit tests (or no tests) to begin with, so it could be a way to improve it. In any case you can test your basic hypotheses of how the library works.

Ideas to investigate

  1. OO and classes vs prototypes (JavaScript).
  2. "There is a lack of reuse in OO because there is too much state inside". Libraries must expose too much of their innards through APIs, functional programming model should be better at this.

Pointers

Articles:

  1. Richard P. Gabriel - Worse Is Better

Blogs:

  1. How to read code – a primer

Books:

  1. Douglas Crockford - JavaScript: The Good Parts
    In the absence of the book, Crockford's lecture series on JavaScript is probably a good start.
  2. William Strunk, Jr. and E.B. White - The Elements of Style
    For writing better English.
  3. Steve McConnell - Code Complete
    On software engineering process and best practices.
  4. Gerald Weinberg - The Psychology of Computer Programming

Talks:

  1. Joshua Bloch - How to Design a Good API and Why it Matters

systems are too complicated, dammit!

April 14th, 2010

I'm reading Peter Seibel's book "Coders at work". It's a collection of interviews with famous programmers. This is the kind of book I really like, it's not a technical book, but it's a meta sort of book where these people tell you what they think about various relevant issues in the industry. And not just issues that concern them directly, but general trends too. It's a very easy read, perfect for the plane or the airport.

There are 15 interviews and almost all these people started playing with computers sort of roughly before there were computers. So if there is a theme running through the book, it is this:

  1. Kids today don't understand how the metal works.
  2. I don't like all these layers of software.

I think it's an understandable point of view coming from people who've written operating systems and compilers and coded assembly and machine code because there was nothing else available. But I don't find it a very helpful perspective.

The basic complaint is this:

  1. Things used to be simple.
  2. Instead of remaining simple, they got complex, but not in a good way (ie. bad technical decisions).

I think this is an "argument from nostalgia", essentially. Back in the days, systems were simpler. Today they are very complicated. And so we wish things were simpler. But this is because some people were present more or less at the "birth" of computer science. The field went from zero and just keeps expanding. That's normal, though.

If a physicist said "I hate how when you discover a layer of particles, there's always something smaller than that!" would people nod in agreement? I remember learning about atomic orbitals and not understanding them and I kept thinking "what was wrong with the Bohr model, that one was so much simpler and nicer?"

The difference between physics and computer science is that in physics there's noone to blame for what is there. There is this sense of "nature is the goddess who bestows gifts upon us and we have the privilege to explore them". In computer science we're not trying to explain or discover anything, we make all this stuff up!

In physics there's no way you can remove the complexity and be left with a simple system, the complexity is there at all levels. But in computers you can delete everything save for the kernel and you indeed have a simple system. (Better yet, delete the kernel too and install a simpler one that you wrote yourself.)

The fundamental difference, to me, is that there is someone to blame. There is noone to blame for atomic orbitals and "why do they have to be so complicated??", but there is someone to blame for every programming language and every system. I don't think for a minute that we wouldn't do the same in physics if we had the chance, though.

What's Plan B?

Of course, the difference between the physical sciences and computer science raises the old "is it a science?" question, but at any rate it is becoming more like physics in the sense of a top to bottom system that is difficult to understand at all levels.

In physics you don't say things like "I would like to throw all this out and start over, make it simple". This is something you can totally do in computers, but chances are you're not gonna have much impact. Sometimes people bemoan how there hasn't been any innovation in operating systems in 30 years. So go write your own, see how many people you can convince to use it.

In a way, the answer is right there. The fact that there aren't any new operating systems taking over from the old ones, _means_ that the old ones have succeeded. They've successfully laid that layer of bricks that has proven to be a strong enough abstraction to move away from that layer in the system and focus our attention on something higher up. They're not works of art in terms of simplicity and purity, but neither are layers of abstraction in physics. *ducks*

Complexity is often presented as a mistake, but the fact that we have all this complexity is not really an accident, it has to be there to do the kinds of things that we want to do.

how do you structure your python codebase?

April 13th, 2010

One thing that's awesome in python is having a small codebase that can fit in a single directory. It's a comfy setting, everything is right there at your fingertips, no directory traversal needed to get a hold of a file.

Flat structure

Let's check out one right now:

./frame.py
./master.py
./mystring.py
./page.py
./sentence.py
./user.py

And here's the import relationship between them:

python_codebase_structure

Easy, straightforward. I can execute any one of the files by itself to make sure the syntax is correct or to run an "if __main__" style unit test on it.

Tree structure

But suppose the codebase is expanding and I decide I have to get a bit more structured? I devise a directory structure like this:

./media/book/__init__.py
./media/book/page.py
./media/book/sentence.py
./media/__init__.py
./media/master.py
./media/movie/frame.py
./media/movie/__init__.py
./media/mystring.py
./user.py

The same files, but now with __init__.py files all over the codebase to tell python to treat each directory as a package. And now my import statements have to be changed too, let's see master:

# from:
import mystring
import page
# to:
import media.mystring
import media.book.page

Nice one. Okay, let's see how this works now:

$ python user.py
page says hello!
sentence says hello!
frame says hello!
mystring says hello!
master says hello!

user imports page and then master. The first 4 lines are due to page, which imports three modules, and finally we see master arriving at the scene. All the files it imports have already been imported, so python doesn't redo those. Everything is in order.

As you can see, imports between modules in the tree work out just fine, page finds both the local sentence and the distant frame.

But if we run master it's a different story:

$ python media/master.py
master says hello!
Traceback (most recent call last):
  File "media/master.py", line 3, in <module>
    import media.mystring
ImportError: No module named media.mystring

And it doesn't actually matter if we run master from media/ or run media/master from ., it's the same result. And it's the same story with page, which is deeper in the tree.

These modules, which used to be executable standalone, no longer are. :(

A hackish solution

So we need something. The nature of the problem is that once we traverse into media/, python no longer can see that there is a package called media, because it's not found anywhere on sys.path. What if we could tell it?

The problem pops up when the module is being executed directly, in fact when __name__ == '__main__'. So this is the case in which we need to do something differently.

Here's the idea. We put a file in the root directory of the codebase, a file we can find that marks where the root is. Then, whenever we need to find the root, we traverse up the tree until we find it. The file is called .codebase_root. And for our special when-executed logic, we use a file called __path__ that we import conditionally. Here's what it looks like:

import os
import sys

def find_codebase(mypath, codebase_rootfile):
    root, branch = mypath, 'nonempty'
    while branch:
        if os.path.exists(os.path.join(root, codebase_rootfile)):
            codebase_root = os.path.dirname(root)
            return codebase_root
        root, branch = os.path.split(root)

def main(codebase_rootfile):
    thisfile = os.path.abspath(sys.modules[__name__].__file__)
    mypath = os.path.dirname(thisfile)
    codebase_root = find_codebase(mypath, codebase_rootfile)

    if codebase_root:
        if codebase_root not in sys.path:
            sys.path.insert(0, codebase_root)

codebase_rootfile = '.codebase_root'
main(codebase_rootfile)

So now, when we find ourselves in a module that's somewhere inside the media/ package, we have this bit of special handling:

print "master says hello!"

if __name__ == '__main__':
    import __path__
import media.mystring
import media.book.page

Unfortunately, importing __path__ unconditionally breaks the case where the file is not being executed directly and I haven't been able to figure out why, so it has to be done like this. :/

python_codebase_structure_treeYou end up with a tree looking as you can see in the screenshot.

I've pushed the example to Github so by all means have a look:

We pass the test, all the modules are executable standalone again. But I can't say that it's awesome to have to do it like this.

aopy: aspect oriented python

March 12th, 2010

Aspect oriented programming is one of those old new ideas that haven't really made a big impact (although perhaps it still will, research ideas sometimes take decades to appear in the professional world). The idea is really neat. We've had a few decades now to practice our modularity and the problem hasn't really been solved fully (the number of design patterns that have been invented I think is telling). What's different about AOP from just plain old "architecture" is the notion of "horizontal" composition. That is to say you don't solve the problem by decomposing and choosing your parts more carefully, you inject code into critical places instead. The technique is just as general, but I would suggest differently applicable.

I realized I haven't really explained anything yet, so let's look at a suitably contrived example.

A network manager

Suppose you're writing a network manager type of application (I actually tried that once). You might have a class called NetworkIface. And the class has an attribute ip. So how does ip get its value? Well, it can be set statically, or via dhcp. In the latter case there is a method dhcp_request, which requests an ip address and assigns to ip.

# <./main.py>
class NetworkIface(object):
    def __init__(self):
        self.ip = None

    def dhcp_request(self):
        self.ip = (10,0,0,131) # XXX magic goes here


if __name__ == '__main__':
    iface = NetworkIface()
    iface.ip = (10,0,0,1)
    iface.ip = (10,0,0,2)
    iface.dhcp_request()

Now suppose you are in the course of writing this application, and you need to do some debugging. It would be nice to know a few things about NetworkIface:

  1. The dhcp server seems to be assigning ip addresses to clients in a (possibly) erroneous manner. We'd like to keep a list of all the ips we've been assigned.
  2. Sometimes the time between making a dhcp request and getting a response seems longer than reasonable. We'd like to time the execution of the dhcp_request method.
  3. Some users are reporting strange failures that we can't seem to reproduce. We would like to do exhaustive logging, ie. every method entry and exit, with parameters.

Now, this kind of debugging logic, however we realize it, is not really something we want in the release version of the application. It doesn't belong. It belongs in debug builds, and we're probably not going to need it permanently.

Here we will demo how to achieve the first point and omit the other two for brevity.

Where AOP comes in

Common to these issues is the fact that they all have to do with information gathering. But that's not necessarily the only thing we might want to do. We might want to tweak the behavior of dhcp_request for the purpose of debugging. For instance, if it took too long to get an ip, we could set one statically after some seconds. Again, that would be a temporary piece of logic not meant to be in the release version.

Now, AOP says "don't change your code, you'll only make a mess of it". Instead you can write that piece of code you need to write, but quite separately from your codebase. This you call an aspect, with the intention that it captures some aspect of behavior you want to inject into your code. And then, during compilation from source code to bytecode (or object code) you inject the aspect code where you want it to go. Compiler? Yes, AOP comes with a special compiler, which makes injection very toggable. Want vanilla code? Use the regular compiler. Want aspected code? Use the AOP compiler.

How does the compiler know where to inject the aspect code? AOP defines strategic injection points called join points. Exactly what these are depends on the programming language, but typically there is a join point preceding a method body, a join point preceding a method call, a method return and so on. (As we shall see, in aopy we are being more Pythonic.) Join points are defined by the AOP framework. But how do you tell it to inject there? With point cuts. A point cut is a matching string (ie. regular expression) which is matched against every join point and determines if injection happens there.

Back to you, John

Enough chatter, the code is getting cold! As it happens, Python has first rate facilities for writing AOP-ish code. We already have language features that can modify or add behavior to existing code:

  • Properties let us micromanage assignment to/reading from instance variables.
  • Decorators let us wrap function execution with additional logic, or even replace the original function with another.
  • Metaclasses can do just about anything to a class by rebinding the class namespace arbitrarily.

We will use these language constructs as units of code injection, called advice in AOP. This way we can reuse all the decorators and metaclasses we already have and we can do AOP much the way we write code already. Let's see the aspects then.

A caching aspect

The first thing we wanted was to cache the values of ip. For this we have a pair of functions which will become methods in NetworkIface and make ip a property.

# <aspects/cache.py>
class Cache():
    def __init__(self):
        self.values = set()
        self.value = None
cache = Cache()

def get(self):
    return cache.value

def set(self, value):
    if value:
        print "c New value: %s" % str(value)
    if any(cache.values):
        prev = ", ".join([str(val) for val in cache.values])
        print "c  Previous values: %s" % prev
    if value:
        cache.values = cache.values.union([value])
    cache.value = value

Cache is the helper class that will store all the values.

A spec

Aspects are defined in specification files which provide the actual link between the codebase and the aspect code.

# <./spec.py>
import aopy

import aspects.cache

caching_aspect = aopy.Aspect()
caching_aspect.add_property('main:NetworkIface/ip', 
    fget=aspects.cache.get, fset=aspects.cache.set)

__all__ = ['caching_aspect']

We start by importing the aopy library and the aspect code we've written. Then we create an Aspect instance and call add_property to add a property advice to this aspect. The first argument is the point cut, ie. the matching string which defines what this property is to be applied to. Here we say "in a module called main, in a class called NetworkIface, find a member called ip". The other two arguments provide the two functions we wish to use in this property.

Compiling

To compile the aspect into the codebase we run the compiler, giving the spec file. And we give it a module (or a path) that indicates the codebase.

$ aopyc -t spec.py main.py
Transforming module /home/alex/uu/colloq/aopy/code/main.py
Pattern matched: main:NetworkIface/ip on main:NetworkIface/ip

The compiler will examine all the modules in the codebase (in this case only main.py) and attempt code injection in each one. Whenever a point cut matches, injection happens. The transformed module is then compiled to bytecode and written to disk (as main.pyc).

main.pyc now looks like this:

# <./main.py> transformed
import sys ### <-- injected
for path in ('.'): ### <-- injected
    if path not in sys.path: ### <-- injected
        sys.path.append(path) ### <-- injected

import aspects.cache as cache ### <-- injected

class NetworkIface(object):
    def __init__(self):
        self.ip = None

    def dhcp_request(self):
        self.ip = (10,0,0,131) # XXX magic goes here
    
    ip = property(fget=cache.get, fset=cache.set) ### <-- injected


if __name__ == '__main__':
    iface = NetworkIface()
    iface.ip = (10,0,0,1)
    iface.ip = (10,0,0,2)
    iface.dhcp_request()

Injected lines are marked. First we find some import statements that are meant to ensure that the codebase can find the aspect code on disk. Then we import the actual aspect module that holds our advice. And finally we can ascertain that NetworkIface has gained a property, with get and set methods pulled in from our aspect code.

Running aspected

When we now run main.pyc we get a message every time ip gets a new value. We also get a printout of all the previous values.

c New value: (10, 0, 0, 1)
c New value: (10, 0, 0, 2)
c  Previous values: (10, 0, 0, 1)
c New value: (10, 0, 0, 131)
c  Previous values: (10, 0, 0, 1), (10, 0, 0, 2)

And the yet the codebase has not been touched, if we execute main.py instead we find the original code.

Here the show endeth

And that wraps up a hasty introduction to AOP with aopy. There is a lot more to be said, both about AOP in Python and aopy in particular. Interested parties are kindly directed to these two papers:

  1. Strategies for aspect oriented programming in Python
  2. aopy: A program transformation-based aspect oriented framework for Python

If you prefer reading code rather than English (variable names are still in English though, sorry about that), here is the repo for your pleasure:

And if you still have no idea what AOP is and think the whole thing is bogus then you can watch this google talk (and who doesn't love a google talk!) by mr. AOP himself.