import_that: XKCD guy flying with Python (Default)
[personal profile] import_that
Python's assert statement is a very useful feature that unfortunately often gets misused. assert takes an expression and an optional error message, evaluates the expression, and if it gives a true value, does nothing. If the expression evaluates to a false value, it raises an AssertionError exception with optional error message. For example:

py> x = 23
py> assert x > 0, "x is zero or negative"
py> assert x%2 == 0, "x is an odd number"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AssertionError: x is an odd number

Many people use assertions as a quick and easy way to raise an exception if an argument is given the wrong value. But this is wrong, badly wrong, for two reasons. The first is that AssertionError is usually the wrong error to give when testing function arguments. Exceptions have meaning to the reader: when we see a KeyError, we expect that the error has something to do with a missing or invalid key, even before reading the error message itself. When we see an IndexError, we expect that an index was out of bounds. It's usually a sign of poor coding to violate those expectations, code like this is wrong:

if not isinstance(x, int):
    raise AssertionError("not an int")

Here the appropriate exception to raise is TypeError, not AssertionError, and using assert raises the wrong sort of exception.

But, and more dangerously, there's a twist with assert: when you run Python with the -O or -OO optimization flags, assertions will be compiled away and never executed. Consequently there is no guarantee that assertions will actually be run. When using assert properly, this is a feature, but when it is used inappropriately, it can lead to code that is completely broken when running with the -O flag.

When should we use assert? In no particular order, assertions should be used for:

  • defensive programming

  • runtime checks on program logic

  • checking contracts (e.g. pre-conditions and post-conditions)

  • program invariants

  • checked documentation

(It's also acceptable to use assertions when testing code, as a sort of quick-and-dirty poor man's unit testing, so long as you accept that the tests simply won't do anything if you run with the -O flag. And I sometimes use assert False to mark code branches that haven't been written yet, and I want them to fail. Although raise NotImplementedError is probably better for that use, if a little more verbose.)

Opinions on assertions vary, because they can be a statement of confidence about the correctness of the code. If you're certain that the code is correct, then assertions are pointless, since they will never fail and you can safely remove them. If you're certain the checks may fail (e.g. when testing input data provided by the user), then you dare not use assert since it may be compiled away and then your checks will be skipped.

It's the situations in between those two that are interesting, times when you're certain the code is correct but not quite absolutely certain. Perhaps you've missed some odd corner case (we're all only human). In this case an extra runtime check helps reassure you that any errors will be caught as early as possible rather than in distant parts of the code.

(This is why assert can be divisive. Since we can vary in our confidence about the correctness of code, one person's useful assertion may be another person's useless runtime test.)

Another good use for assertions is checking program invariants. An invariant is some condition which you can rely on to be true unless a bug causes it to become false. If there's a bug, better to find out as early as possible rather than later in some far off distant piece of code, so we make a test for it, but we don't want to slow the code down with such tests. Hence assert, which can be left on in development and off in production by running with the -O switch.

An example of an invariant might be, if your function expects a database connection to be open when it starts, and promises that it will still be open when it returns, that's an invariant of the function:

def some_function(arg):
    assert not DB.closed()
    ... # code goes here
    assert not DB.closed()
    return result

Assertions also make good checked comments. Instead of writing a comment:

# when we reach here, we know that n > 2

you can ensure it is checked at runtime by turning it into an assertion:

assert n > 2

Assertions are also a form of defensive programming. You're not protecting against errors in the code as it is now, but protecting against changes which introduce errors later. Ideally, unit tests will pick those up, but let's face it, even when tests exist at all, they're often incomplete. Build-bots can be down and nobody notices for weeks, or people forget to run tests before committing code. Having an internal check is one more line of defence against errors sneaking in, especially those which don't noisily fail but cause the code to malfunction and return incorrect results.

Suppose you have a series of if...elif blocks, where you know ahead of time what values some variable is expected to have:

# target is expected to be one of x, y, or z, and nothing else.
if target == x:
elif target == y:

Assume that this code is completely correct now. But will it stay correct? Requirements change. Code changes. What happens if the requirements change to allow target = w, with associated action run_w_code? If we change the code that sets target, but neglect to change this block of code, it will wrongly call run_z_code() and Bad Things will occur. It would be good to write this block of code defensively, so that it will either be correct, or fail immediately, even in the face of future changes.

The comment at the start of the block is a good first step, but people are notorious for failing to read and update comments. Chances are it will soon be obsolete. But with an assertion, we can both document the assumptions of this block, and cause a clean, immediate failure if they are violated:

assert target in (x, y, z)
if target == x:
elif target == y:
    assert target == z

Here, the assertions are both defensive programming and checked documentation. I consider this to be a far superior solution than this:

if target == x:
elif target == y:
elif target == z:
    # This can never happen. But just in case it does...
    raise RuntimeError("an unexpected error occurred")

This tempts some helpful developer to "clean it up" by removing the "unnecessary" test for target == z and removing the "dead code" of the RuntimeError. Besides, "unexpected error" messages are embarrassing when they occur, and they will.

Design by contract is another good use of assertions. In design by contract, we consider that functions make "contracts" with their callers. E.g. something like this:

"If you pass me an non-empty string, I guarantee to return the first character of that string converted to uppercase."

If the contract is broken by either the function or the code calling it, the code is buggy. We say that functions have pre-conditions (the constraints that arguments are expected to have) and post-conditions (the constraints on the return result). So this function might be coded as:

def first_upper(astring):
    # Check the pre-conditions.
    assert isinstance(astring, str) and len(astring) > 0
    result = astring[0].upper()
    # Check the post-conditions.
    assert isinstance(result, str) and len(result) == 1
    # We cannot use isupper() here, because the char might not be a letter.
    assert result == result.upper()
    return result

The aim of Design By Contract is that in a correct program, the pre-conditions and post-conditions will always hold. Assertions are typically used, since (so the idea goes) by the time we release the bug-free program and put it into production, the program will be correct and we can safely remove the checks.

It's important here to realise that contracts apply between parts of a single program. They are inappropriate between (say) a library and the caller of the library. Since the library cannot trust that callers will honour contracts, it is not appropriate to disable the contract checking (particularly the pre-condition checks), and so libraries should not rely on assert for error checking (at least not for parts of the public API). Instead, they should perform the test and explicitly use raise if the text fails.

Here's my advice when not to use assertions:

  • Never use them for testing user-supplied data, or for anything where the check must take place under all circumstances.

  • Don't use assert for checking anything that you expect might fail in the ordinary use of your program. Assertions are for extraordinary failure conditions. Your users should never see an AssertionError; if they do, it's a bug in your code to be fixed.

  • In particular, don't use assert just because it's shorter than an explicit test followed by a raise. assert is not a shortcut for lazy coders.

  • Don't use them for checking input arguments to public library functions (private ones are okay) since you don't control the caller and can't guarantee that it will never break the function's contract.

  • Don't use assert for any error which you expect to recover from. In other words, you've got no reason to catch an AssertionError exception in production code.

  • Don't use so many assertions that they obscure the code.

An earlier version of this post appeared here.


import_that: XKCD guy flying with Python (Default)
Steven D'Aprano

May 2015

345678 9

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags