Archive for the ‘python internals’ Category

python timings

April 14th, 2013

On the one hand, we measure database query latency in milliseconds. On the other hand, a read from L1 cache costs less than a nanosecond. That got me thinking that there is a pretty big spectrum in between the two. I wonder how much time typical language constructs cost. Just as a reminder, here is the typical list of important timings:

0.5 ns        read from L1 cache
            1   ns        execute cpu instruction
            7   ns        read from L2 cache
          100   ns        read from memory
       20,000   ns        transmit over local network
    8,000,000   ns        read from disk
  150,000,000   ns        transmit over the internet Europe -> US
1,000,000,000   ns        one second

There happens to be a really easy way to do a quick and dirty measurement using ipython, with its built-in timing feature. It takes an expression that it will execute a number of times, depending on how long it takes, with an upper bound in seconds. So for really trivial expressions you get a large number of repetitions:

In [66]: %timeit 1+2
10000000 loops, best of 3: 20.7 ns per loop

The catch is that timeit expects an expression, so the simplest way to get around that is to make every test a function call, and in there we can run arbitrary expressions and statements alike. The baseline will then be a function with an empty body.

Here are the results from my cpython 2.7.3:

5 ns        assignment
            4 ns        integer_addition
           10 ns        string_concat
            5 ns        string_interpolate
           35 ns        dict_lookup
           77 ns        list_comprehension

           22 ns        branch
        1,095 ns        try_catch

       86,895 ns        create_class        
           97 ns        instantiate_class
          135 ns        call_method
          105 ns        call_function

          217 ns        get_current_time
        1,745 ns        get_current_date

Clearly this leaves a lot to be desired from a methodological standpoint. The reference list of latencies is not scaled to my laptop in particular, plus we are adding the overhead of a function call to every measurement (and then trying to subtract it out), but at least it's constant across all measurements. At best these numbers are a rough indication of how much things cost, but that's good enough for our purpose.

Finally, here is the code:

def call_function():  # 105ns
    pass

def create_class():  # 87us
    class C(object):
        pass

class D(object):
    def meth(self):
        pass

def instantiate_class():  # 202ns
    D()

d = D()
def call_method():  # 240ns
    d.meth()

def assignment():  # 110ns
    a = 1

def branch():  # 127ns
    if True:
        pass

def try_catch():  # 1.2us
    try:
        raise Exception
    except:
        pass


def integer_addition():  # 109ns
    1 + 2

def string_concat():  # 115ns
    "a" + "b"

def string_interpolate():  # 110ns
    "a%s" % "b"

d = {'a': 1}
def dict_lookup():  # 140ns
    d['a']

l = []
def list_comprehension():  # 182ns
    [x for x in l]

import time
def get_current_time():  # 322ns
    time.time()

from datetime import datetime
def get_current_date():  # 1.85us
    datetime.now()