Python bytecode, meet the language behind the language! It's a language like any other, with its own syntax, semantics, design patterns and pitfalls. It's even better than Python, because it's executed directly, not compiled to some intermediate easy-enough-to-interpret language. So if those bytecode compilation times are bugging you, you know what to do.
As you know, CPython is a stack based VM. Okay, what does that mean? It means that if you examine a simple Python expression it will generally follow this pattern:
- Push (load) the operand on the stack. This can be a constant or a variable.
- Perform the operation on the operand.
This will consume the operand and replace it on the stack with the result of the operation.
- Store the value in a variable (if there is an assignment) or discard it (if there isn't).
If the operation were binary you would push two operands on the stack instead, and so on. And each of the steps is a bytecode instruction, so one "atomic" looking Python expression becomes a handful of bytecode instructions. The stack is like a desk - you put something down that you're going to use, perform the work, and clean up the desk afterwards (just like we all do in real life, amirite?).
An assignment statement
Not convinced? import dis
and follow me, then.
def main(a):
# We will disassemble the body of this function
a = a + 7
import dis
dis.dis(main)
Disassembly of main:
2 0 LOAD_FAST 0 (a)
3 LOAD_CONST 1 (7)
6 BINARY_ADD
7 STORE_FAST 0 (a)
10 LOAD_CONST 0 (None)
13 RETURN_VALUE
Woah, what's this? The 2 in the left column is the line number of where this function starts. It's actually the first line in the module, so that would be line 0 or line 1 depending on your counting system, but close enough.
Let's dig in. We have 6 instructions just to do a = a + 7, seems a little much.
The first instruction is LOAD_FAST 0. LOAD_FAST operates on local variables, so it will load the zeroth local variable from somewhere onto the stack. You can see dis
printing the value of the variables/constants in parentheses in the right column. The value is a.
The next instruction is LOAD_CONST 1, whose value is 7. We are loading the first constant onto the stack. (Which seems odd, as I only spot one constant in this function, what's the zeroth constant then? Patience, Robinson, we'll get there soon.)
Next, we have BINARY_ADD, which does not take an argument. Conveniently, it finds two operands ready and waiting on the stack, however. It pops them both off, performs the operation, and pushes the result onto the stack again.
Finally, we have STORE_FAST 0, which stores whatever is on the stack into the zeroth local variable, whose name is a. So we've accounted for our Python code and everything looks rosy.
But the bytecode compiler has a dilemma. This is a function it's compiling, and functions are supposed to have return values, even if this one doesn't. The caller of the function will be pissed if he doesn't get a return value.
A cunning plan. Why not just return nothing? Push the value None
on the stack (by golly, there's our zeroth constant!), then return it.
A branch
Okay, loading variables and literals, performing operations. *shrug* What if you had a branch, tough guy, would you indent your bytecode, huh?
def main(a):
if condition:
a = 7
else:
a = 9
Disassembly of main:
2 0 LOAD_GLOBAL 0 (condition)
3 POP_JUMP_IF_FALSE 15
3 6 LOAD_CONST 1 (7)
9 STORE_FAST 0 (a)
12 JUMP_FORWARD 6 (to 21)
5 >> 15 LOAD_CONST 2 (9)
18 STORE_FAST 0 (a)
>> 21 LOAD_CONST 0 (None)
24 RETURN_VALUE
Well, no. And hang tight, because this is where Python bytecode becomes a little assembly-ish.
We load the condition
variable and perform the check. If true we just continue to the next instruction, but if false we jump to location 15, which is the body of the else
block.
a) if we're executing the if
case, we perform the assignment, then jump forward by 6 locations to location 21, which is the end of the function (and return None
).
b) if we're executing the else
case, we perform the assignment and just continue on, returning None
.
Object lookups
So far we have looked at some very basic operations, and the opcodes are pretty similar to what you would find when compiling the same code in C and looking at the assembly. (Well, except that we are still in the world of dynamic typing and BINARY_ADD has _no idea_ what kind of things it will be trying to add until it actually tries to perform the operation and only then! discovers the types of the operands.)
Let's try something more high level - like operations on objects. Two of the most common things we do in Python are: indexing into objects (for lists and dicts), and looking up attributes on them.
def main(obj):
obj[x]
obj.x
Disassembly of main:
2 0 LOAD_FAST 0 (obj)
3 LOAD_GLOBAL 0 (x)
6 BINARY_SUBSCR
7 POP_TOP
3 8 LOAD_FAST 0 (obj)
11 LOAD_ATTR 0 (x)
14 POP_TOP
15 LOAD_CONST 0 (None)
18 RETURN_VALUE
Why that seems surprisingly concise! The index operation aside (a single opcode), attribute lookup is a complicated piece of logic that involves looking at obj.__dict__
, looking at the class dict (and base class dicts), calling __getattribute__
and possibly __getattr__
. All of that in a single opcode? That's value for money in my book.
At this point it should dawn on you that bytecode compilation does not actually compile the program very much. It's basically rewriting the same thing in bytecode, and that's why the compilation is so fast.
Approaches to interpretation
So why bother? Let's think about our options.
- Parse and interpret Python source code on-the-fly.
Impractical, because the code you need to execute determines how many tokens you need to parse. An assignment may only need the rest of the tokens on the line, but a class definition needs the whole class body. Noone actually does this.
- Parse the source code to an Abstract Syntax Tree (AST) and write the interpreter as an AST walker.
Possible, but it makes the interpreter a bit complicated. It has to remain in sync with the structure of the AST at all times. Ruby MRI 1.8 uses this approach.
- Parse the source code to an AST, compile that to bytecode, interpret the bytecode.
Most interpreted languages use this approach. It decouples the interpreter from the AST, so instead of traversing a tree you're executing a linear stream of instructions.