Benchmarks are stored in a collection of .py
files in the
benchmark suite’s benchmark
directory (as defined by
benchmark_dir
in the asv.conf.json
file). They may be
arbitrarily nested in subdirectories, and all .py
files will be
used, regardless of their file name.
Within each .py
file, each benchmark is a function or method. The
name of the function must have a special prefix, depending on the type
of benchmark. asv
understands how to handle the prefix in either
CamelCase
or lowercase with underscores. For example, to create a
timing benchmark, the following are equivalent:
def time_range():
for i in range(1000):
pass
def TimeRange():
for i in range(1000):
pass
Benchmarks may be organized into methods of classes if desired:
class Suite:
def time_range(self):
for i in range(1000):
pass
def time_xrange(self):
for i in xrange(1000):
pass
There are some options to asv run
that may be useful when writing
benchmarks.
You may find that asv run
spends a lot of time setting up the
environment each time. You can have asv run
use an existing
Python environment that already has the benchmarked project and all of
its dependencies installed. Use the --python
argument to specify
a Python environment to use:
asv run --python=python
If you don’t care about getting accurate timings, but just want to
ensure the code is running, you can add the --quick
argument,
which will run each benchmark only once:
asv run --quick
In order to display the standard error output (this includes exception tracebacks)
that your benchmarks may produce, pass the --show-stderr
flag:
asv run --show-stderr
Finally, there is a special command, asv dev
, that uses all of
these features and is equivalent to:
asv run --python=same --quick --show-stderr --dry-run
If initialization needs to be performed that should not be included in
the timing of the benchmark, include that code in a setup
method
on the class, or add an attribute called setup
to a free function.
For example:
class Suite:
def setup(self):
# load data from a file
with open("/usr/share/words.txt", "r") as fd:
self.words = fd.readlines()
def time_upper(self):
for word in self.words:
word.upper()
# or equivalently...
words = []
def my_setup():
global words
with open("/usr/share/words.txt", "r") as fd:
words = fd.readlines()
def time_upper():
for word in words:
word.upper()
time_upper.setup = my_setup
You can also include a module-level setup
function, which will be
run for every benchmark within the module, prior to any setup
assigned specifically to each function.
Similarly, benchmarks can also have a teardown
function that is
run after the benchmark. This is useful if, for example, you need to
clean up any changes made to the filesystem.
Note that although different benchmarks run in separate processes, for
a given benchmark repeated measurement (cf. repeat
attribute) and
profiling occur within the same process. For these cases, the setup
and teardown routines are run multiple times in the same process.
If setup
raises a NotImplementedError
, the benchmark is marked
as skipped.
The setup
method is run multiple times, for each benchmark and for
each repeat. If the setup
is especially expensive, the
setup_cache
method may be used instead, which only performs the
setup calculation once and then caches the result to disk. It is run
only once also for repeated benchmarks and profiling, unlike
setup
. setup_cache
can persist the data for the benchmarks it
applies to in two ways:
- Returning a data structure, which
asv
pickles to disk, and then loads and passes it as the first argument to each benchmark.- Saving files to the current working directory (which is a temporary directory managed by
asv
) which are then explicitly loaded in each benchmark process. It is probably best to load the data in asetup
method so the loading time is not included in the timing of the benchmark.
A separate cache is used for each environment and each commit of the project begin tested and is thrown out between benchmark runs.
For example, caching data in a pickle:
class Suite:
def setup_cache(self):
fib = [1, 1]
for i in range(100):
fib.append(fib[-2] + fib[-1])
return fib
def track_fib(self, fib):
return fib[-1]
As another example, explicitly saving data in a file:
class Suite:
def setup_cache(self):
with open("test.dat", "wb") as fd:
for i in range(100):
fd.write('{0}\n'.format(i))
def setup(self):
with open("test.dat", "rb") as fd:
self.data = [int(x) for x in fd.readlines()]
def track_numbers(self):
return len(self.data)
The setup_cache
timeout can be specified by setting the
.timeout
attribute of the setup_cache
function. The default
value is the maximum of the timeouts of the benchmarks using it.
Each benchmark can have a number of arbitrary attributes assigned to
it. The attributes that asv
understands depends on the type of
benchmark and are defined below. For free functions, just assign the
attribute to the function. For methods, include the attribute at the
class level. For example, the following are equivalent:
def time_range():
for i in range(1000):
pass
time_range.timeout = 120.0
class Suite:
timeout = 120.0
def time_range(self):
for i in range(1000):
pass
The following attributes are applicable to all benchmark types:
timeout
: The amount of time, in seconds, to give the benchmark
to run before forcibly killing it. Defaults to 60 seconds.
benchmark_name
: If given, used as benchmark function name instead of generated one
<module>.
.
pretty_name
: If given, used to display the benchmark name instead of the
benchmark function name.
version
: Used to determine when to invalidate old benchmark
results. Benchmark results produced with a different value of the
version than the current value will be ignored. The value can be
any Python string (or other object, str()
will be taken).
Default (if version=None
or not given): hash of the source code
of the benchmark function and setup and setup_cache methods. If the
source code of any of these changes, old results become invalidated.
You might want to run a single benchmark for multiple values of some
parameter. This can be done by adding a params
attribute to the
benchmark object:
def time_range(n):
for i in range(n):
pass
time_range.params = [0, 10, 20, 30]
This will also make the setup and teardown functions parameterized:
class Suite:
params = [0, 10, 20]
def setup(self, n):
self.obj = range(n)
def teardown(self, n):
del self.obj
def time_range_iter(self, n):
for i in self.obj:
pass
If setup
raises a NotImplementedError
, the benchmark is marked
as skipped for the parameter values in question.
The parameter values can be any Python objects. However, it is often best to use only strings or numbers, because these have simple unambiguous text representations.
When you have multiple parameters, the test is run for all of their combinations:
def time_ranges(n, func_name):
f = {'range': range, 'arange': numpy.arange}[func_name]
for i in f(n):
pass
time_ranges.params = ([10, 1000], ['range', 'arange'])
The test will be run for parameters (10, 'range'), (10, 'arange'),
(1000, 'range'), (1000, 'arange')
.
You can also provide informative names for the parameters:
time_ranges.param_names = ['n', 'function']
These will appear in the test output; if not provided you get default names such as “param1”, “param2”.
Note that setup_cache
is not parameterized.
Timing benchmarks have the prefix time
.
The timing itself is based on the Python standard library’s timeit
module, with some extensions for automatic heuristics shamelessly
stolen from IPython’s %timeit
magic function. This means that in most cases the benchmark function
itself will be run many times to achieve accurate timing.
The default timing function is the POSIX CLOCK_PROCESS_CPUTIME
,
which measures the CPU time used only by the current process. This is
available as time.process_time
in Python 3.3 and later, but a
backport is included with asv
for earlier versions of Python.
Note
One consequence of using CLOCK_PROCESS_CPUTIME
is that the time
spent in child processes of the benchmark is not included. If your
benchmark spawns other processes, you may get more accurate results
by setting the timer
attribute on the benchmark to
timeit.default_timer
.
For best results, the benchmark function should contain as little as
possible, with as much extraneous setup moved to a setup
function:
class Suite:
def setup(self):
# load data from a file
with open("/usr/share/words.txt", "r") as fd:
self.words = fd.readlines()
def time_upper(self):
for word in self.words:
word.upper()
How setup
and teardown
behave for timing benchmarks
is similar to the Python timeit
module, and the behavior is controlled
by the number
and repeat
attributes, as explained below.
Attributes:
warmup_time
: asv
will spend this time (in seconds) in calling
the benchmarked function repeatedly, before starting to run the actual
benchmark. If not specified, warmup_time
defaults to 0.1 seconds
(on PyPy, the default is 1.0 sec).
repeat
: The number measurement samples to collect. Each sample
consists of running the benchmark number
times. The median
time from all of these repetitions is used as the final measurement
result. When not provided, repeat
defaults to 10. Setup and
teardown are before and after each sample.
number
: Manually choose the number of iterations in each sample.
If number
is specified, sample_time
is ignored.
Note that setup
and teardown
are not run between iterations:
setup
runs first, then the timed benchmark routine is called
number
times, and after that teardown
runs.
sample_time
: asv
will automatically select number
so that
each sample takes approximatively sample_time
seconds. If not
specified, sample_time
defaults to 0.1 seconds.
timer
: The timing function to use, which can be any source of
monotonically increasing numbers, such as time.clock
, time.time
or time.process_time
. If it’s not provided, it defaults to
time.process_time
(or a backported version of it for versions of
Python prior to 3.3), but other useful values are
timeit.default_timer
to use the default timeit
behavior on
your version of Python.
On Windows, time.clock
has microsecond granularity, but
time.time
’s granularity is 1/60th of a second. On Unix,
time.clock
has 1/100th of a second granularity, and time.time
is
much more precise. On either platform, timeit.default_timer
measures wall clock time, not the CPU time. This means that other
processes running on the same computer may interfere with the
timing. That’s why the default of time.process_time
, which only
measures the time used by the current process, is often the best
choice.
The sample_time
, number
, repeat
, and timer
attributes
can be adjusted in the setup()
routine, which can be useful for
parameterized benchmarks.
Memory benchmarks have the prefix mem
.
Memory benchmarks track the size of Python objects. To write a memory benchmark, write a function that returns the object you want to track:
def mem_list():
return [0] * 256
The asizeof module
is used to determine the size of Python objects. Since asizeof
includes the memory of all of an object’s dependencies (including the
modules in which their classes are defined), a memory benchmark
instead calculates the incremental memory of a copy of the object,
which in most cases is probably a more useful indicator of how much
space each additional object will use. If you need to do something
more specific, a generic Tracking (Generic) benchmark can be used
instead.
Note
The memory benchmarking feature is still experimental.
asizeof
may not be the most appropriate metric to use.
Note
The memory benchmarks are not supported on PyPy.
Peak memory benchmarks have the prefix peakmem
.
Peak memory benchmark tracks the maximum resident size (in bytes) of the process in memory. This does not necessarily count memory paged on-disk, or that used by memory-mapped files. To write a peak memory benchmark, write a function that does the operation whose maximum memory usage you want to track:
def peakmem_list():
[0] * 165536
Note
The peak memory benchmark also counts memory usage during the
setup
routine, which may confound the benchmark results. One
way to avoid this is to use setup_cache
instead.
It is also possible to use asv
to track any arbitrary numerical
value. “Tracking” benchmarks can be used for this purpose and use the
prefix track
. These functions simply need to return a numeric
value. For example, to track the number of objects known to the
garbage collector at a given state:
import gc
def track_num_objects():
return len(gc.get_objects())
track_num_objects.unit = "objects"
Attributes:
unit
: The unit of the values returned by the benchmark. Used
for display in the web interface.