Robustness is an interesting notion, isn't it? It isn't about being prepared for what you expect to happen, but actually for what you don't expect. If you take this somewhat vague idea of robustness a step further you drift towards the more software engineering-y practice of defensive programming.
Before delving in, it might be good to consider what precisely you can hope to achieve with robust code. If your program crashes with a KeyError
on a dictionary lookup, it's not very robust. On the other hand, if you keep getting AttributeError
s because your socket
object is null because the network is dead, you have bigger problems than attribute access.
Robust code doesn't absolve you from error handling. Your program will experience error conditions, and you have to design it so that you can handle them in the right place. If your code is robust, you can achieve this goal: errors will be caught without crashing your program.
Accessing attributes
Attribute access is a minefield. I know, the object oriented philosophy makes it sound like a trifle, but it's not. When you first started coding you probably wrote code like this:
class Bottle(object):
def capacity(self):
return 45
# ...
print bottle.capacity()
It looks very innocuous, but what could go wrong here? We make the assumption, perhaps unwittingly, that bottle
has been initialized at this point in time. Suppose bottle
is an instance member that was set to None
initially, and was supposed to be instantiated by the time we execute this line. Those of us who've been on a few java gulags know that this is how the recurring NullPointerException
nightmare begins. In Python you get an AttributeError
(sounds more benign, doesn't it?).
If you expected to receive bottle
from a database or the network, you probably have good reason to suspect that it might be null. So you'll probably write a lot of code like this:
if bottle and bottle.capacity:
print bottle.capacity()
If bottle isn't null (None
, 0
or an empty string/iterable), we think everything is in order. We might also check the attribute we want to make sure that too is not null. The trouble is, that is an attribute access. So if bottle
isn't null, but missing capacity
, there's your AttributeError
again!
It should be obvious by now, that calling any method on bottle
is off the table, in case bottle
is null. Instead, let's do it this way:
f = getattr(bottle, 'capacity', None)
if f:
print f()
getattr
is a very clever idea. You tell it to get the capacity
attribute from the bottle
object, and in case that attribute doesn't exist, just return None
. The only way you'll get an exception here (a NameError
) is if bottle
isn't defined in the namespace. So once we have this object, either capacity
or None
, we check that it's not null, and then call the method.
You might think that this seems like low level nitpicking. And anyway, how do you know that capacity
is a method, you could still get a TypeError
here. Why not just check if bottle
is an instance of the class Bottle
. If it is, then it's reasonable to expect capacity
is a method too:
if isinstance(bottle, Bottle):
print bottle.capacity()
This isn't as robust as it seems. Remember that we're trying to prepare for something that wasn't planned. Suppose that someone moved capacity
to a baseclass (superclass) of Bottle
. Now we are saying only Bottle
instances can use the capacity
method, even though other objects also have it.
It's more Pythonic to cast a wider net and not be so restrictive. We could use getattr
to get the object that we expect is a method. And then we can check if it's a function:
unkn = getattr(bottle, 'capacity', None)
import types
if isinstance(unkn, types.FunctionType):
print unkn()
This doesn't work, because a method is not of type function. You can call it, but it's not a function (queer, isn't it?). An instance of a class that implements __call__
is also callable, but also not a function. So we should check if the object has a __call__
method, because that's what makes it callable:
unkn = getattr(bottle, 'capacity', None)
if callable(unkn):
print unkn()
Now obviously, writing every method call in your program like this would be madness. As a coder, you have to consider the degree of uncertainty about your objects.
Another way to go about this is to embrace exceptions. You could also write the most naive code and just wrap a try/except
around it. I don't enjoy that style as much, because try/except
alters the control flow of your program. This merits a longer discussion, but basically you have to increment the level of indentation, variable scope is a concern, and exceptions easily add up.
Setting attributes
If you only want to set a predetermined attribute, then nothing is easier (obviously this won't work on objects that use slots, like a dict). You can set attributes both for instances and classes:
bottle.volume = 4
Bottle.volume = 4
But if the attribute name is going to be determined by some piece of data (like the name of a field in a database table, say), you need another approach. You could just set the attribute in the object's __dict__
:
bottle.__dict__['volume'] = 4
Bottle.__dict__['volume'] = 4 ## fails
But this is considered poor style, __dict__
isn't supposed to be accessed explicitly by other objects. Furthermore, the __dict__
of a class is exposed as a dictproxy object, so you can't do this to set a class attribute. But you can use setattr
:
setattr(bottle, 'volume', 4)
setattr(Bottle, 'volume', 4)
Dictionary lookup
Dictionaries, the bedrock of Python. We use them all the time, not always wisely. The naive approach is to assume the key exists:
bases = {"first": "Who", "second": "What"}
print bases["third"] ## raises KeyError
Failing that, dicts have a has_key
method just for this purpose:
if bases.has_key("third"):
print bases["third"]
But it's more Pythonic to keep it simple as can be:
if "third" in bases:
print bases["third"]
dicts also have a failsafe method equivalent to getattr
, called get
. You can also give it a default value (as the second parameter, not shown here) to return if the key doesn't exist:
third = bases.get("third")
if third:
print third
I would argue that it's preferable, because you don't have to look up the element twice. (And you don't risk defeat snatched from the jaws of victory if a context switch occurs between those two statements and another thread removes the key after you've checked for it.)
Doing this much explicit runtime checking of attributes partly defeats the purpose of using a dynamically typed language. I think the proper place to catch this kind of error is the point at which "bottle" was initialized. Whoever initialized it should've checked that it initialized properly before trying to use it (or the initialization code should've thrown an exception when it failed, rather than somehow silently allow your object to become null.) If someone deliberately passes a null object to your code which is trying to access Bottle attributes, it's probably a bug they should fix rather than an error for you to try to catch at runtime.
Even if you caught the error with tons of manual runtime checking, how would you fix it other than throwing an exception manually yourself? Say you do a check, and "bottle" has a "capacity" attribute, but it's not callable. What would you do to recover? Presumably there's a reason you're trying to access the capacity value, and it's going to be impossible to proceed without it. (Unless printing / using the capacity value is an optional part of your program, in which case that kind of attribute testing you're doing is the way to go, but then we're not talking about robustness any longer.)
I would tend to agree that excessive checking is a bad thing, but there are times, as stated, when you aren't really sure what you are getting. I tend to use getattr a lot when using various API modules that return objects parsed from web-based apps. Sometime sthe packets are incomplete, sometimes the module is a bit rubbish and doesn't bother filling in the things that should be there, sometimes I AM writing the module, so I need to check that everything is properly formed.