asv.step_detect¶

Attributes¶

_rangemedian

Classes¶

L1Dist

Fast computations for the L1 distance measures.

Functions¶

`detect_steps`(y[, w])	Detect steps in a (noisy) signal.
`detect_regressions`(steps[, threshold, min_size])	Detect regressions in a (noisy) signal.
`solve_potts`(y, w, gamma[, min_size, max_size, ...])	Fit penalized stepwise constant function (Potts model) to data.
`solve_potts_autogamma`(y, w[, beta])	Solve Potts problem with automatically determined gamma.
`solve_potts_approx`(y, w[, gamma, min_size])	Fit penalized stepwise constant function (Potts model) to data
`merge_pieces`(gamma, right, values, dists, mu_dist, ...)	Combine consecutive intervals in Potts model solution, if doing
`get_mu_dist`(y, w)
`rolling_median_dev`(items)	Compute median(items[:j]), deviation[j]) for j in range(1, len(items))
`weighted_median`(y, w)	Compute weighted median of y with weights w.
`golden_search`(f, a, b[, xatol, ftol, expand_bounds])	Find minimum of a function on interval [a, b]
`_plot_potts`(x, sol)

Module Contents¶

asv.step_detect._rangemedian = None[source]¶

asv.step_detect.detect_steps(y, w=None)[source]¶

Detect steps in a (noisy) signal.

Parameters¶

ylist of float, none or nan: Single benchmark result series, with possible missing data
wlist of float, none or nan: Data point relative weights. Missing weights are set equal to the median weight.

Returns¶

stepslist of (left_pos, right_pos, value, min_value, err_est): List containing a decomposition of the input data to a piecewise constant function. Each element contains the left (inclusive) and right (exclusive) bounds of a segment, the average value on the segment, the minimum value in the segment, and the l1 error estimate, \(|Y - avg|\). Missing data points are not necessarily contained in any segment; right_pos-1 is the last non-missing data point.

asv.step_detect.detect_regressions(steps, threshold=0, min_size=2)[source]¶

Detect regressions in a (noisy) signal.

A regression means an upward step in the signal. The value ‘before’ a regression is the value immediately preceding the upward step. The value ‘after’ a regression is the minimum of values after the upward step.

Parameters¶

stepslist of (left, right, value, min, error): List of steps computed by detect_steps, or equivalent
thresholdfloat: Relative threshold for reporting regressions. Filter out jumps whose relative size is smaller than threshold, if they are not necessary to explain the difference between the best and the latest values.
min_sizeint: Minimum number of commits in a regression to consider it.

Returns¶

latest_value: Latest value
best_value: Best value
regression_poslist of (before, after, value_before, best_value_after): List of positions between which the value increased. The first item corresponds to the last position at which the best value was obtained. The last item indicates the best value found after the regression (which is not always the value immediately following the regression).

asv.step_detect.solve_potts(y, w, gamma, min_size=1, max_size=None, min_pos=None, max_pos=None, mu_dist=None)[source]¶

Fit penalized stepwise constant function (Potts model) to data.

Given a time series y = {y_1, …, y_n}, fit series x = {x_1, …, x_n} by minimizing the cost functional:

F[x] = gamma * J(x) + sum(|y - x|**p)

where J(x) is the number of jumps (x_{j+1} != x_j) in x.

The algorithm used is described in Friedrich et al. [PTTS_FKLW08], it uses dynamic programming to find an exact solution to the problem (within the constraints specified).

The computational cost is ~ O(n**2 log n).

Parameters¶

ylist of floats: Input data series
gammafloat: Penalty parameter.
min_sizeint, optional: Minimum interval size to consider
max_sizeint, optional: Maximum interval size to consider
mu_distDist, optional: Precomputed interval means/medians and cost function values
min_posint, optional: Start point (inclusive) for the interval grid
max_posint, optional: End point (exclusive) for the interval grid

Returns¶

rightlist: List of (exclusive) right bounds of the intervals
valueslist: List of values of the intervals
distlist: List of sum(|y - x|**p) for each interval.
mu_distDist: Precomputed interval means/medians and cost function values

References¶

[PTTS_FKLW08]

F. Friedrich, A. Kempe, V. Liebscher, and G. Winkler. Complexity Penalized M-Estimation: Fast Computation. Journal of Computational and Graphical Statistics, 17(1):201–224, 2008. arXiv:27594299, doi:10.1198/106186008X285591.

asv.step_detect.solve_potts_autogamma(y, w, beta=None, **kw)[source]¶

Solve Potts problem with automatically determined gamma.

The optimal value is determined by minimizing the information measure:

f(gamma) = beta J(x(gamma)) + log sum(abs(x(gamma) - y)**p)

where x(gamma) is the solution to the Potts problem for a fixed gamma. The minimization is only performed rather roughly.

Parameters¶

betafloat or ‘bic’: Penalty parameter. Default is 4*ln(n)/n, similar to Bayesian information criterion for gaussian model with unknown variance assuming 4 DOF per breakpoint.

asv.step_detect.solve_potts_approx(y, w, gamma=None, min_size=1, **kw)[source]¶

Fit penalized stepwise constant function (Potts model) to data approximately, in linear time.

Do this by running the exact solver using a small maximum interval size, and then combining consecutive intervals together if it decreases the cost function.

asv.step_detect.merge_pieces(gamma, right, values, dists, mu_dist, max_size)[source]¶: Combine consecutive intervals in Potts model solution, if doing that reduces the cost function.

class asv.step_detect.L1Dist(y, w)[source]¶

Fast computations for the L1 distance measures.

This computes:

mu(l, r) = median(y[l:r+1], weights=w[l:r+1])
dist(l, r) = sum(w*abs(x - mu(l, r)) for x, w in zip(y[l:r+1], weights[l:r+1]))

We do not use here an approach that has asymptotically optimal performance; at least \(O(n^2 * \log(n))\) would be achievable, whereas we have here \(O(n^3)\). The asymptotic performance does not matter for asv.step_detect.solve_potts_approx(), which only looks at small windows of the data. It is more important to try to optimize the constant prefactors, which for Python means minimal code.

y[source]¶

w[source]¶

mu_memo[source]¶

dist_memo[source]¶

mu(*a)[source]¶

dist(*a)[source]¶

cleanup_cache()[source]¶

asv.step_detect.get_mu_dist(y, w)[source]¶

asv.step_detect.rolling_median_dev(items)[source]¶

Compute median(items[:j]), deviation[j]) for j in range(1, len(items)) in O(n log n) time.

deviation[j] == sum(abs(x - median(items[:j])) for x in items[:j])

asv.step_detect.weighted_median(y, w)[source]¶: Compute weighted median of y with weights w.

asv.step_detect.golden_search(f, a, b, xatol=1e-06, ftol=1e-08, expand_bounds=False)[source]¶

Find minimum of a function on interval [a, b] using golden section search.

If expand_bounds=True, expand the interval so that the function is first evaluated at x=a and x=b.

asv.step_detect._plot_potts(x, sol)[source]¶