I enjoy coding in Python because I find it very readable and easy to understand quickly. But certain idioms I brought over from other programming languages have cost me a lot in terms of code performance. I used for loops and recursive calls in other languages I coded in. Nether of these are Python idioms and rarely are they the faster in terms of execution time.
I've written up six tips for writing faster Python code. None of these tips were of my own discovery; a whole host of Python Wiki documentation, job interview questions and Stackoverflow answers led me to these speed improvements.
The benchmarks were run in CPython's reference interpreter for Python 2.7, other implementations and versions may vary in their performance.
The right data structure for the job
In Python, dict, list and set are three popular data structures. They vary in their operations but there are cases where two or more of the structures can be used to solve the same problem. But the same functionality can have very different time complexities in different structures.
Imagine having ten suitcases and you're trying to find a shirt in one of them. If you have an inventory list stating which suitcase the shirt is in then you'd only need to look in that one suitcase to find it. If you don't have an inventory list then you'd need to search through possibly every suitcase till you've found the shirt you're after. This is a 'membership test': "Does a shirt exist within any of the suitcases?".
Sets are implemented using a hash table (think of this as the inventory list) so membership tests are O(1). Searching a set is as simple as seeing if the object appears at the position described by the hash table. The size of the set doesn't affect the lookup speed.
Lists implement the same functionality using variable-length arrays so operations are O(n). With lists, the whole list potentially needs to be searched. As the list size grows the amount of time needed to search it could grow as well.
In the following example membership testing is 5.8x faster using a set than a list.
from timeit import timeit >>> timeit(stmt='"9" in string', setup='string = list("abcefg0123456789")', number=10 ** 8) 13.123163938522339 >>> timeit(stmt='"9" in string', setup='string = set("abcefg0123456789")', number=10 ** 8) 2.534371852874756
Even if I made the list a string it's still 1.28x slower than a set.
>>> from timeit import timeit >>> timeit(stmt='"9" in string', setup='string = "abcefg0123456789"', number=10 ** 8) 3.213447093963623
Not all list operations are slow. The operations to append, get an item, set an item and get the length of a list are all O(1). But deleting an item from a list is O(n) whereas deleting a key from a dictionary is O(1).
The Python wiki has a good write up on time complexity of Python's data structures.
Keep variables local
CPython can store local variables faster than global variables. Look ups are also quicker as it'll look for variables in the local namespace before searching the global and build-in namespaces.
In the following example the loop running in a method is 10.55x faster than the loop running in the global namespace.
>>> from timeit import timeit >>> setup = """ def main(): for _ in xrange(1000): pass """ >>> timeit(stmt='main()', setup=setup, number=10 ** 5) 0.9138619899749756 >>> timeit(stmt='for _ in xrange(1000): pass', number=10 ** 5) 9.643129110336304
C code usually outruns interpreted code
Calling out to methods written in C is usually quicker than running interpreted code.
In this example the while loop will run completely interpreted while the xrange implementation below it will call out to C. The xrange implementation 2.9x faster as a result.
>>> from timeit import timeit >>> timeit(stmt='while i < 10 ** 8: i += 1', setup='i = 0', number=1) 2.727679967880249 >>> timeit(stmt='for _ in xrange(0, 10 ** 8): pass', number=1) 0.9349009990692139
Avoid recursive calls
Recursive method calls in Python cause a new stack frame allocation for every call. If you can iterate over a list instead then you avoid this allocation and will see a tremendous speed increase.
The code below runs around 4x faster as a loop than as a recursive method.
>>> from timeit import timeit # Derived from rosettacode.org/wiki/Fibonacci_sequence >>> setup = """ def fib(n): def _fib(prvprv, prv, c): if c < 1: return prvprv return _fib(prv, prvprv + prv, c - 1) return _fib(0, 1, n) """ >>> timeit(stmt='fib(35)', setup=setup, number=1000) 0.021694183349609375 >>> setup = """ def fib(n): a, b = 1, 1 for _ in xrange(n - 1): a, b = b, a + b return a """ >>> timeit(stmt='fib(35)', setup=setup, number=1000) 0.005464076995849609
There is also a default recursion depth of 1,000 calls in CPython. If you do run recursive method calls make sure they won't call themselves over 999 times.
>>> def recursive(limit, n=0): if n >= limit: return n return recursive(limit, n + 1) >>> recursive(50) 50 >>> recursive(500) 500 >>> recursive(1000) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 5, in recursive File "<stdin>", line 5, in recursive File "<stdin>", line 5, in recursive ... RuntimeError: maximum recursion depth exceeded
Join rather than concatenate
Strings are immutable objects in Python. In order to change a string a new representation needs to be created. If you concatenate a string in a loop you'll create a new representation of that string during every iteration.
Lists only have a single representation so their creation event happens only once.
In the example below joining is 4x faster than concatenating.
>>> from timeit import timeit >>> setup = """ def concat_str(limit): resp = '' for num in xrange(limit): resp += `num` return resp """ >>> timeit(stmt='concat_str(2 ** 20)', setup=setup, number=10) 3.9850499629974365 >>> setup = """ def join_list(limit): return ''.join([`num` for num in xrange(limit)]) """ >>> timeit(stmt='join_list(2 ** 20)', setup=setup, number=10) 0.9801981449127197
List comprehensions over loops
When a for loop executes it looks for an append attribute in its subject list and calls it during each iteration. List comprehensions don't do this lookup and instead have a dedicated LIST_APPEND opcode they call instead.
In the example below the for loop implementation is 25x slower than the list comprehension implementation.
>>> from timeit import timeit >>> setup = """ def join_list(limit): resp =  for num in xrange(limit): resp.append(`num`) return ''.join(resp) """ >>> timeit(stmt='join_list(2 ** 20)', setup=setup, number=10) 24.570796012878418 >>> setup = """ def join_list(limit): return ''.join([`num` for num in xrange(limit)]) """ >>> timeit(stmt='join_list(2 ** 20)', setup=setup, number=10) 0.9801981449127197