Intro to profiling
Python's dirty little secret is that it can be made to run pretty fast.
The bare-metal HPC people will be angrily tweeting at me now, or rather, they would be if they could get their wireless drivers working.
Still, there are some things you really don't want to do in Python. Nested loops are usually a bad idea. But often you won't know where your code is slowing down just by looking at it and trying to accelerate everything can be a waste of time. (Developer time, that is, both now and in the future: you incur technical debt if you unintentionally obfuscate code to make it faster when it doesn't need to be).
The first step is always to find the bottlenecks in your code, via profiling: analyzing your code by measuring the execution time of its parts.
Tools
cProfile
line_profiler
timeit
Note: If you haven't already installed it, you can do
conda install line_profiler
or
pip install line_profiler
Some bad code
Here's a bit of code guaranteed to perform poorly: it sleeps for 1.5 seconds after doing any work! We will profile it and see where we might be able to help.
import numpy
from time import sleep
def bad_call(dude):
sleep(.5)
def worse_call(dude):
sleep(1)
def sumulate(foo):
if not isinstance(foo, int):
return
a = numpy.random.random((1000, 1000))
a @ a
ans = 0
for i in range(foo):
ans += i
bad_call(ans)
worse_call(ans)
return ans
sumulate(150)
11175
using cProfile
cProfile
is the built-in profiler in Python (available since Python 2.5). It provides a function-by-function report of execution time. First import the module, then usage is simply a call to cProfile.run()
with your code as argument. It will print out a list of all the functions that were called, with the number of calls and the time spent in each.
import cProfile
cProfile.run('sumulate(150)')
10 function calls in 1.549 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.023 0.023 1.549 1.549 <ipython-input-1-ab37b7e7eed9>:10(sumulate)
1 0.000 0.000 0.501 0.501 <ipython-input-1-ab37b7e7eed9>:4(bad_call)
1 0.000 0.000 1.001 1.001 <ipython-input-1-ab37b7e7eed9>:7(worse_call)
1 0.000 0.000 1.549 1.549 <string>:1(<module>)
1 0.000 0.000 1.549 1.549 {built-in method builtins.exec}
1 0.000 0.000 0.000 0.000 {built-in method builtins.isinstance}
2 1.502 0.751 1.502 0.751 {built-in method time.sleep}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
1 0.024 0.024 0.024 0.024 {method 'random_sample' of 'mtrand.RandomState' objects}
You can see here that when our code sumulate()
executes, it spends almost all its time in the method time.sleep
(a bit over 1.5 seconds).
If your program is more complicated that this cute demo, you'll have a hard time parsing the long output of cProfile
. In that case, you may want a profiling visualization tool, like SnakeViz. But that is outside the scope of this tutorial.
using line_profiler
line_profiler
offers more granular information thatn cProfile
: it will give timing information about each line of code in a profiled function.
Load the line_profiler
extension
%load_ext line_profiler
For a pop-up window with results in notebook:
IPython has an %lprun
magic to profile specific functions within an executed statement. Usage:
%lprun -f func_to_profile <statement>
(get more help by running %lprun?
in IPython).
Profiling two functions
%lprun -f bad_call -f worse_call sumulate(13)
Write results to a text file
%lprun -T timings.txt -f sumulate sumulate(12)
*** Profile printout saved to text file 'timings.txt'.
%load timings.txt
Let's break down what these results are telling us.
Line #
corresponds to the line in a given script or notebook cell. To toggle line numbers in a code cell, hit l
in command mode.
Hits
shows how many times a given line was encountered as a result of running the sumulate
function. You can see that most lines in sumulate
were hit only once, while the lines in the for
loop were hit several times.
Time
is the amount of time spent, in total, on a given line. Note at the top of the profiling report that the time unit here is microseconds.
Per hit
is the amount of time, on average, each hit
on a given line took. This is the same as the Time
column for lines that were only hit once, but you can see the Per hit
time has more meaning in the for
loop lines.
% Time
is what we are really interested in. It tells us what percentage of the total run time of sumulate
was spent on a given line. There are lots of ways to optimize all sorts of operations, but you should focus your time and energy on optimizing the code that is costing you the most time.
You could try to further optimize Python's builtin matrix multiplication to get the a @ a
to operate a little bit faster. You might even shave off a microsecond (although probably not). But who cares? That matrix multiply takes up 2% of runtime. Focus your efforts on the expensive parts, which here are bad_call
and worse_call
.
Profiling on the command line
Open file, add @profile
decorator to any function you want to profile, then run
kernprof -l script_to_profile.py
which will generate script_to_profile.py.lprof
(pickled result). To view the results, run
python -m line_profiler script_to_profile.py.lprof
from IPython.display import IFrame
IFrame('http://localhost:8888/terminals/1', width=800, height=700)
<iframe
width="800"
height="700"
src="http://localhost:8888/terminals/1"
frameborder="0"
allowfullscreen
></iframe>
timeit
timeit
is not perfect, but it is helpful.
Potential concerns re: timeit
- Returns minimum time of run
- Only runs benchmark 3 times
- It disables garbage collection
python -m timeit -v "print(42)"
python -m timeit -r 25 "print(42)"
python -m timeit -s "gc.enable()" "print(42)"
Line magic
%timeit x = 5
100000000 loops, best of 3: 11.3 ns per loop
Cell magic
%%timeit
x = 5
y = 6
x + y
10000000 loops, best of 3: 32.9 ns per loop
The -q
flag quiets output. The -o
flag allows outputting results to a variable. The -q
flag sometimes disagrees with OSX so please remove it if you're having issues.
a = %timeit -qo x = 5
print(a.all_runs)
[1.1260504741221666, 1.1415640730410814, 1.1300840568728745]
print(a.best)
1.1260504741221666e-08
print(a.worst)
4.5797787606716156e-07