asv.step_detect

Attributes

Classes

L1Dist

Fast computations for the L1 distance measures.

Functions

detect_steps(y[, w])

Detect steps in a (noisy) signal.

detect_regressions(steps[, threshold, min_size])

Detect regressions in a (noisy) signal.

solve_potts(y, w, gamma[, min_size, max_size, ...])

Fit penalized stepwise constant function (Potts model) to data.

solve_potts_autogamma(y, w[, beta])

Solve Potts problem with automatically determined gamma.

solve_potts_approx(y, w[, gamma, min_size])

Fit penalized stepwise constant function (Potts model) to data

merge_pieces(gamma, right, values, dists, mu_dist, ...)

Combine consecutive intervals in Potts model solution, if doing

get_mu_dist(y, w)

rolling_median_dev(items)

Compute median(items[:j]), deviation[j]) for j in range(1, len(items))

weighted_median(y, w)

Compute weighted median of y with weights w.

golden_search(f, a, b[, xatol, ftol, expand_bounds])

Find minimum of a function on interval [a, b]

_plot_potts(x, sol)

Module Contents

asv.step_detect._rangemedian = None[source]
asv.step_detect.detect_steps(y, w=None)[source]

Detect steps in a (noisy) signal.

Parameters

ylist of float, none or nan

Single benchmark result series, with possible missing data

wlist of float, none or nan

Data point relative weights. Missing weights are set equal to the median weight.

Returns

stepslist of (left_pos, right_pos, value, min_value, err_est)

List containing a decomposition of the input data to a piecewise constant function. Each element contains the left (inclusive) and right (exclusive) bounds of a segment, the average value on the segment, the minimum value in the segment, and the l1 error estimate, \(|Y - avg|\). Missing data points are not necessarily contained in any segment; right_pos-1 is the last non-missing data point.

asv.step_detect.detect_regressions(steps, threshold=0, min_size=2)[source]

Detect regressions in a (noisy) signal.

A regression means an upward step in the signal. The value ‘before’ a regression is the value immediately preceding the upward step. The value ‘after’ a regression is the minimum of values after the upward step.

Parameters

stepslist of (left, right, value, min, error)

List of steps computed by detect_steps, or equivalent

thresholdfloat

Relative threshold for reporting regressions. Filter out jumps whose relative size is smaller than threshold, if they are not necessary to explain the difference between the best and the latest values.

min_sizeint

Minimum number of commits in a regression to consider it.

Returns

latest_value

Latest value

best_value

Best value

regression_poslist of (before, after, value_before, best_value_after)

List of positions between which the value increased. The first item corresponds to the last position at which the best value was obtained. The last item indicates the best value found after the regression (which is not always the value immediately following the regression).

asv.step_detect.solve_potts(y, w, gamma, min_size=1, max_size=None, min_pos=None, max_pos=None, mu_dist=None)[source]

Fit penalized stepwise constant function (Potts model) to data.

Given a time series y = {y_1, …, y_n}, fit series x = {x_1, …, x_n} by minimizing the cost functional:

F[x] = gamma * J(x) + sum(|y - x|**p)

where J(x) is the number of jumps (x_{j+1} != x_j) in x.

The algorithm used is described in Friedrich et al. [PTTS_FKLW08], it uses dynamic programming to find an exact solution to the problem (within the constraints specified).

The computational cost is ~ O(n**2 log n).

Parameters

ylist of floats

Input data series

gammafloat

Penalty parameter.

min_sizeint, optional

Minimum interval size to consider

max_sizeint, optional

Maximum interval size to consider

mu_distDist, optional

Precomputed interval means/medians and cost function values

min_posint, optional

Start point (inclusive) for the interval grid

max_posint, optional

End point (exclusive) for the interval grid

Returns

rightlist

List of (exclusive) right bounds of the intervals

valueslist

List of values of the intervals

distlist

List of sum(|y - x|**p) for each interval.

mu_distDist

Precomputed interval means/medians and cost function values

References

[PTTS_FKLW08]

F. Friedrich, A. Kempe, V. Liebscher, and G. Winkler. Complexity Penalized M-Estimation: Fast Computation. Journal of Computational and Graphical Statistics, 17(1):201–224, 2008. arXiv:27594299, doi:10.1198/106186008X285591.

asv.step_detect.solve_potts_autogamma(y, w, beta=None, **kw)[source]

Solve Potts problem with automatically determined gamma.

The optimal value is determined by minimizing the information measure:

f(gamma) = beta J(x(gamma)) + log sum(abs(x(gamma) - y)**p)

where x(gamma) is the solution to the Potts problem for a fixed gamma. The minimization is only performed rather roughly.

Parameters

betafloat or ‘bic’

Penalty parameter. Default is 4*ln(n)/n, similar to Bayesian information criterion for gaussian model with unknown variance assuming 4 DOF per breakpoint.

asv.step_detect.solve_potts_approx(y, w, gamma=None, min_size=1, **kw)[source]

Fit penalized stepwise constant function (Potts model) to data approximately, in linear time.

Do this by running the exact solver using a small maximum interval size, and then combining consecutive intervals together if it decreases the cost function.

asv.step_detect.merge_pieces(gamma, right, values, dists, mu_dist, max_size)[source]

Combine consecutive intervals in Potts model solution, if doing that reduces the cost function.

class asv.step_detect.L1Dist(y, w)[source]

Fast computations for the L1 distance measures.

This computes:

mu(l, r) = median(y[l:r+1], weights=w[l:r+1])
dist(l, r) = sum(w*abs(x - mu(l, r)) for x, w in zip(y[l:r+1], weights[l:r+1]))

We do not use here an approach that has asymptotically optimal performance; at least \(O(n^2 * \log(n))\) would be achievable, whereas we have here \(O(n^3)\). The asymptotic performance does not matter for asv.step_detect.solve_potts_approx(), which only looks at small windows of the data. It is more important to try to optimize the constant prefactors, which for Python means minimal code.

y[source]
w[source]
mu_memo[source]
dist_memo[source]
mu(*a)[source]
dist(*a)[source]
cleanup_cache()[source]
asv.step_detect.get_mu_dist(y, w)[source]
asv.step_detect.rolling_median_dev(items)[source]

Compute median(items[:j]), deviation[j]) for j in range(1, len(items)) in O(n log n) time.

deviation[j] == sum(abs(x - median(items[:j])) for x in items[:j])

asv.step_detect.weighted_median(y, w)[source]

Compute weighted median of y with weights w.

Find minimum of a function on interval [a, b] using golden section search.

If expand_bounds=True, expand the interval so that the function is first evaluated at x=a and x=b.

asv.step_detect._plot_potts(x, sol)[source]