asv.step_detect¶
Attributes¶
Classes¶
Fast computations for the L1 distance measures. |
Functions¶
|
Detect steps in a (noisy) signal. |
|
Detect regressions in a (noisy) signal. |
|
Fit penalized stepwise constant function (Potts model) to data. |
|
Solve Potts problem with automatically determined gamma. |
|
Fit penalized stepwise constant function (Potts model) to data |
|
Combine consecutive intervals in Potts model solution, if doing |
|
|
|
Compute median(items[:j]), deviation[j]) for j in range(1, len(items)) |
|
Compute weighted median of y with weights w. |
|
Find minimum of a function on interval [a, b] |
|
Module Contents¶
- asv.step_detect.detect_steps(y, w=None)[source]¶
Detect steps in a (noisy) signal.
Parameters¶
- ylist of float, none or nan
Single benchmark result series, with possible missing data
- wlist of float, none or nan
Data point relative weights. Missing weights are set equal to the median weight.
Returns¶
- stepslist of (left_pos, right_pos, value, min_value, err_est)
List containing a decomposition of the input data to a piecewise constant function. Each element contains the left (inclusive) and right (exclusive) bounds of a segment, the average value on the segment, the minimum value in the segment, and the l1 error estimate, \(|Y - avg|\). Missing data points are not necessarily contained in any segment; right_pos-1 is the last non-missing data point.
- asv.step_detect.detect_regressions(steps, threshold=0, min_size=2)[source]¶
Detect regressions in a (noisy) signal.
A regression means an upward step in the signal. The value ‘before’ a regression is the value immediately preceding the upward step. The value ‘after’ a regression is the minimum of values after the upward step.
Parameters¶
- stepslist of (left, right, value, min, error)
List of steps computed by detect_steps, or equivalent
- thresholdfloat
Relative threshold for reporting regressions. Filter out jumps whose relative size is smaller than threshold, if they are not necessary to explain the difference between the best and the latest values.
- min_sizeint
Minimum number of commits in a regression to consider it.
Returns¶
- latest_value
Latest value
- best_value
Best value
- regression_poslist of (before, after, value_before, best_value_after)
List of positions between which the value increased. The first item corresponds to the last position at which the best value was obtained. The last item indicates the best value found after the regression (which is not always the value immediately following the regression).
- asv.step_detect.solve_potts(y, w, gamma, min_size=1, max_size=None, min_pos=None, max_pos=None, mu_dist=None)[source]¶
Fit penalized stepwise constant function (Potts model) to data.
Given a time series y = {y_1, …, y_n}, fit series x = {x_1, …, x_n} by minimizing the cost functional:
F[x] = gamma * J(x) + sum(|y - x|**p)
where J(x) is the number of jumps (x_{j+1} != x_j) in x.
The algorithm used is described in Friedrich et al. [PTTS_FKLW08], it uses dynamic programming to find an exact solution to the problem (within the constraints specified).
The computational cost is ~ O(n**2 log n).
Parameters¶
- ylist of floats
Input data series
- gammafloat
Penalty parameter.
- min_sizeint, optional
Minimum interval size to consider
- max_sizeint, optional
Maximum interval size to consider
- mu_distDist, optional
Precomputed interval means/medians and cost function values
- min_posint, optional
Start point (inclusive) for the interval grid
- max_posint, optional
End point (exclusive) for the interval grid
Returns¶
- rightlist
List of (exclusive) right bounds of the intervals
- valueslist
List of values of the intervals
- distlist
List of
sum(|y - x|**p)
for each interval.- mu_distDist
Precomputed interval means/medians and cost function values
References¶
[PTTS_FKLW08]F. Friedrich, A. Kempe, V. Liebscher, and G. Winkler. Complexity Penalized M-Estimation: Fast Computation. Journal of Computational and Graphical Statistics, 17(1):201–224, 2008. arXiv:27594299, doi:10.1198/106186008X285591.
- asv.step_detect.solve_potts_autogamma(y, w, beta=None, **kw)[source]¶
Solve Potts problem with automatically determined gamma.
The optimal value is determined by minimizing the information measure:
f(gamma) = beta J(x(gamma)) + log sum(abs(x(gamma) - y)**p)
where x(gamma) is the solution to the Potts problem for a fixed gamma. The minimization is only performed rather roughly.
Parameters¶
- betafloat or ‘bic’
Penalty parameter. Default is 4*ln(n)/n, similar to Bayesian information criterion for gaussian model with unknown variance assuming 4 DOF per breakpoint.
- asv.step_detect.solve_potts_approx(y, w, gamma=None, min_size=1, **kw)[source]¶
Fit penalized stepwise constant function (Potts model) to data approximately, in linear time.
Do this by running the exact solver using a small maximum interval size, and then combining consecutive intervals together if it decreases the cost function.
- asv.step_detect.merge_pieces(gamma, right, values, dists, mu_dist, max_size)[source]¶
Combine consecutive intervals in Potts model solution, if doing that reduces the cost function.
- class asv.step_detect.L1Dist(y, w)[source]¶
Fast computations for the L1 distance measures.
This computes:
mu(l, r) = median(y[l:r+1], weights=w[l:r+1]) dist(l, r) = sum(w*abs(x - mu(l, r)) for x, w in zip(y[l:r+1], weights[l:r+1]))
We do not use here an approach that has asymptotically optimal performance; at least \(O(n^2 * \log(n))\) would be achievable, whereas we have here \(O(n^3)\). The asymptotic performance does not matter for
asv.step_detect.solve_potts_approx()
, which only looks at small windows of the data. It is more important to try to optimize the constant prefactors, which for Python means minimal code.
- asv.step_detect.rolling_median_dev(items)[source]¶
Compute median(items[:j]), deviation[j]) for j in range(1, len(items)) in O(n log n) time.
deviation[j] == sum(abs(x - median(items[:j])) for x in items[:j])