Writing benchmarks¶
Benchmarks are stored in a collection of .py
files in the
benchmark suite’s benchmark
directory (as defined by
benchmark_dir
in the asv.conf.json
file). They may be
arbitrarily nested in subdirectories, and all .py
files will be
used, regardless of their file name.
Within each .py
file, each benchmark is a function or method. The
name of the functon must have a special prefix, depending on the type
of benchmark. asv
understands how to handle the prefix in either
CamelCase
or lowercase with underscores. For example, to create a
timing benchmark, the following are equivalent:
def time_range():
for i in range(1000):
pass
def TimeRange():
for i in range(1000):
pass
Benchmarks may be organized into methods of classes if desired:
class Suite:
def time_range(self):
for i in range(1000):
pass
def time_xrange(self):
for i in xrange(1000):
pass
Running benchmarks during development¶
There are some options to asv run
that may be useful when writing
benchmarks.
You may find that asv run
spends a lot of time setting up the
environment each time. You can have asv run
use an existing
Python environment that already has the benchmarked project and all of
its dependencies installed. Use the --python
argument to specify
a Python environment to use:
asv run --python=python
If you don’t care about getting accurate timings, but just want to
ensure the code is running, you can add the --quick
argument,
which will run each benchmark only once:
asv run --quick
In order to display the standard error output (this includes exception tracebacks)
that your benchmarks may produce, pass the --show-stderr
flag:
asv run --show-stderr
Finally, there is a special command, asv dev
, that uses all of
these features and is equivalent to:
asv run --python=same --quick --show-stderr --dry-run
Setup and teardown functions¶
If initialization needs to be performed that should not be included in
the timing of the benchmark, include that code in a setup
method
on the class, or set add an attribute called setup
to a free
function. For example:
class Suite:
def setup(self):
# load data from a file
with open("/usr/share/words.txt", "r") as fd:
self.words = fd.readlines()
def time_upper(self):
for word in self.words:
word.upper()
# or equivalently...
words = []
def setup():
global words
with open("/usr/share/words.txt", "r") as fd:
words = fd.readlines()
def time_upper():
for word in words:
word.upper()
time_upper.setup = setup
You can also include a module-level setup
function, which will be
run for every benchmark within the module, prior to any setup
assigned specifically to each function.
Similarly, benchmarks can also have a teardown
function that is
run after the benchmark. This is useful if, for example, you need to
clean up any changes made to the filesystem. Generally, however, it
is not required: each benchmark runs in its own process, so any
tearing down of in-memory state happens automatically.
If setup
raises a NotImplementedError
, the benchmark is marked
as skipped.
Benchmark attributes¶
Each benchmark can have a number of arbitrary attributes assigned to
it. The attributes that asv
understands depends on the type of
benchmark and are defined below. For free functions, just assign the
attribute to the function. For methods, include the attribute at the
class level. For example, the following are equivalent:
def time_range():
for i in range(1000):
pass
time_range.timeout = 120.0
class Suite:
timeout = 120.0
def time_range(self):
for i in range(1000):
pass
The following attributes are applicable to all benchmark types:
timeout
: The amount of time, in seconds, to give the benchmark to run before forcibly killing it. Defaults to 60 seconds.
Parameterized benchmarks¶
You might want to run a single benchmark for multiple values of some
parameter. This can be done by adding a params
attribute to the
benchmark object:
def time_range(n):
for i in range(n):
pass
time_range.params = [0, 10, 20, 30]
This will also make the setup and teardown functions parameterized:
class Suite:
params = [0, 10, 20]
def setup(self, n):
self.obj = range(n)
def teardown(self, n):
del self.obj
def time_range_iter(self, n):
for i in self.obj:
pass
If setup
raises a NotImplementedError
, the benchmark is marked
as skipped for the parameter values in question.
The parameter values can be any Python objects. However, it is often best to use only strings or numbers, because these have simple unambiguous text representations.
When you have multiple parameters, the test is run for all of their combinations:
def time_ranges(n, func_name):
f = {'range': range, 'arange': numpy.arange}[f]
for i in f(n):
pass
time_ranges.params = ([10, 1000], ['range', 'arange'])
The test will be run for parameters (10, 'range'), (10, 'arange'),
(1000, 'range'), (1000, 'arange')
.
You can also provide informative names for the parameters:
time_ranges.param_names = ['n', 'function']
These will appear in the test output; if not provided you get default names such as “param1”, “param2”.
Benchmark types¶
Timing¶
Timing benchmarks have the prefix time
.
The timing itself is based on the Python standard library’s timeit
module, with some extensions for automatic heuristics shamelessly
stolen from IPython’s %timeit
magic function. This means that in most cases the benchmark function
itself will be run many times to achieve accurate timing.
The default timing function is the POSIX CLOCK_PROCESS_CPUTIME
,
which measures the CPU time used only by the current process. This is
available as time.process_time
in Python 3.3 and later, but a
backport is included with asv
for earlier versions of Python.
Note
One consequence of using CLOCK_PROCESS_CPUTIME
is that the time
spent in child processes of the benchmark is not included. If your
benchmark spawns other processes, you may get more accurate results
by setting the timer
attribute on the benchmark to
timeit.default_timer
.
For best results, the benchmark function should contain as little as
possible, with as much extraneous setup moved to a setup
function:
class Suite:
def setup(self):
# load data from a file
with open("/usr/share/words.txt", "r") as fd:
self.words = fd.readlines()
def time_upper(self):
for word in self.words:
word.upper()
Attributes:
goal_time
:asv
will automatically select the number of iterations to run the benchmark so that it takes betweengoal_time / 10
andgoal_time
seconds each time. If not specified,goal_time
defaults to 2 seconds.number
: Manually choose the number of iterations. Ifnumber
is specified,goal_time
is ignored.repeat
: The number of times to repeat the benchmark, with each repetition running the benchmarknumber
of times. The minimum time from all of these repetitions is used as the final result. When not provided, defaults totimeit.default_repeat
(3).timer
: The timing function to use, which can be any source of monotonically increasing numbers, such astime.clock
,time.time
ortime.process_time
. If it’s not provided, it defaults totime.process_time
(or a backported version of it for versions of Python prior to 3.3), but other useful values aretimeit.default_timer
to use the defaulttimeit
behavior on your version of Python.On Windows,
time.clock
has microsecond granularity, buttime.time
’s granularity is 1/60th of a second. On Unix,time.clock
has 1/100th of a second granularity, andtime.time
is much more precise. On either platform,timeit.default_timer
measures wall clock time, not the CPU time. This means that other processes running on the same computer may interfere with the timing. That’s why the default oftime.process_time
, which only measures the time used by the current process, is often the best choice.
The goal_time
, number
, repeat
, and timer
attributes
can be adjusted in the setup()
routine, which can be useful for
parameterized benchmarks.
Memory¶
Memory benchmarks have the prefix mem
.
Memory benchmarks track the size of Python objects. To write a memory benchmark, write a function that returns the object you want to track:
def mem_list():
return [0] * 256
The asizeof module
is used to determine the size of Python objects. Since asizeof
includes the memory of all of an object’s dependencies (including the
modules in which their classes are defined), a memory benchmark
instead calculates the incremental memory of a copy of the object,
which in most cases is probably a more useful indicator of how much
space each additional object will use. If you need to do something
more specific, a generic Tracking (Generic) benchmark can be used
instead.
Note
The memory benchmarking feature is still experimental.
asizeof
may not be the most appropriate metric to use.
Peak Memory¶
Peak memory benchmarks have the prefix peakmem
.
Peak memory benchmark tracks the maximum resident size (in bytes) of the process in memory. This does not necessarily count memory paged on-disk, or that used by memory-mapped files. To write a peak memory benchmark, write a function that does the operation whose maximum memory usage you want to track:
def peakmem_list():
[0] * 165536
Note
The peak memory benchmark also counts memory usage during the
setup
routine, which may confound the benchmark results. One
way to avoid this is to spawn a separate subprocess for executing
memory-intensive setup tasks.
Tracking (Generic)¶
It is also possible to use asv
to track any arbitrary numerical
value. “Tracking” benchmarks can be used for this purpose and use the
prefix track
. These functions simply need to return a numeric
value. For example, to track the number of objects known to the
garbage collector at a given state:
import gc
def track_num_objects():
return len(gc.get_objects())
track_num_objects.unit = "objects"
Attributes:
unit
: The unit of the values returned by the benchmark. Used for display in the web interface.