在实时时间序列数据的峰值信号检测 [英] Peak signal detection in realtime timeseries data

查看:1766
本文介绍了在实时时间序列数据的峰值信号检测的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

更新:表现最好的算法的到目前为止的<一个href="http://stackoverflow.com/questions/22583391/peak-recognition-in-realtime-timeseries-data/22640362#22640362">is这一个

Update: The best performing algorithm so far is this one.

这个问题旨在探讨现有的稳健的方法或算​​法用于检测突发峰值实时时间序列数据。

This question seeks to explore the available robust methods or algorithms for detecting sudden peaks in real-time timeseries data.

我不求快速和明显的答案。我想每一个答案,以提供不同的解决问题的方法,补充了该方法的优点和缺点。

I do not seek fast and obvious answers. I would like every answer to provide a different approach to the problem, complemented with the advantages and disadvantages of the proposed method.

考虑下面的数据集

p = [1 1 1.1 1 0.9 1 1 1.1 1 0.9 1 1.1 1 1 0.9 1 1 1.1 1 1,...
    1 1 1.1 0.9 1 1.1 1 1 0.9 1 1.1 1 1 1.1 1 0.8 0.9 1 1.2 0.9 1,...
    1 1.1 1.2 1 1.5 1 3 2 5 3 2 1 1 1 0.9 1,...
    1 3 2.6 4 3 3.2 2 1 1 0.8 4 4 2 2.5 1 1 1];

(Matlab的格式,但它不是语言,但对算法)

(Matlab format but it's not about the language but about the algorithm)

您可以清楚地看到,有三个大峰和一些小高峰。

You can clearly see that there are three large peaks and some small peaks.

规格

此数据集的类的时间序列数据集,该问题是有关的一个具体例子。此类数据集有两个基本特征:

This dataset is a specific example of the class of timeseries datasets that the question is about. This class of datasets has two general features:

  1. 在没有与一般的平均基本噪声
  2. 有大'的的'或'的更高的数据点的从噪声显著偏离。
  1. There is basic noise with a general mean
  2. There are large 'peaks' or 'higher data points' that significantly deviate from the noise.

让我们还假设如下:

  • 峰的宽度不能预先确定
  • 峰的高度明显偏离其它值
  • 在所使用的算法必须计算实时(所以更改与每个新的数据点)

显然,对于这样的情况,一个边界值需要其触发信号来构造。然而,在现实中,该边界值不能是静态的,必须基于一个算法确定实时

Clearly, for such a situation, a boundary value needs to be constructed which triggers signals. However, in reality, this boundary value cannot be static and must be determined realtime based on an algorithm.

我的问题:如何计算出这样的边界值实时?什么是公知的,对于这种情况适用算法? (没有这个数据集专门)

My Question: How can I calculate such boundary values realtime? What are well-known and applicable algorithms for such situations? (not this data set specifically)

此外,我想了解以下内容:

Additionally, I would like to know the following:

  • 是否有可能检测到峰毫不拖延地只给在该时间之前什么是已知的?
  • 什么需要了解有关峰值检测到它(有一些给定的延迟)?

<子>有根有据/创造性的选择和有益的见解都非常AP preciated。 (在任何语言的答案都很好:这是关于算法)

推荐答案

我已经建造了一个很好的执行算法发出信号时所述数据点的标准偏差的远离移动一个指定数目的意思。然而,当一个信号被检测到,随后的数据点,同时也是一个信号(因此显著远离移动的意思),也不会损坏所述信号阈值。即,算法创建一个'的新的平均的'和'的新st.dev 的'在其中的数据点是信号的的使用。因此,阈值不被破坏,并能够正确地识别未来的信号也不会损失性能。这个作品非常好!

New (very) robust algorithm that uses the standard deviation

I have constructed a very well performing algorithm that signals when the data points are a specified number of standard deviations away from the moving mean. However, when a signal is detected, subsequent data points that are also a signal (so significantly away from the moving mean), will not corrupt the signal threshold. That is, the algorithm creates a 'new mean' and 'new st.dev.' in which the data points that are signals are not used. Therefore, the threshold remains uncorrupted and is able to correctly identify future signals too, without loss of performance. This works extremely well!

为了显示这一强大的算法的力量,我有prepared中,用户可以指定自己的数据演示。这个小演示显示这两个算法如何工作的,为什么它是非常有用。

In order to display the power of this robust algorithm, I have prepared a demo in which the user can specify its own data. This little demo displays both how the algorithm works and why it is so useful.

输入图像的描述在这里

完整的工作Matlab的code这个演示

function [] = RobustDetectionDemo()

%% SPECIFICATIONS
LAG         = 10;       % lag for the moving mean and moving st. dev.
DIFF        = 3.5;      % number of st. dev. from the mean to signal
INFLUENCE   = 0.0;      % when signal: how much is mean/st.dev. influenced?
                            % or e.g. 0.05/0.1 for influencing
DIRECTION   = 'both';   % signal when 'up'/'down'/'both' from the mean

%%
figure(1);
subplot(2,2,1);
title('Draw 30 data points');
ylim([0 5]); xlim([0 50]);
[x,y] = ginputExtra_realtime(30, true, LAG, DIFF, INFLUENCE, DIRECTION);
end

function [x y] = ginputExtra_realtime(n,booText, LAG, DIFF, INFLUENCE, DIRECTION)
if booText == true
    bText = booText;
else
    bText = false;
end
H = gca;
set(H, 'YLimMode', 'manual'); set(H, 'XLimMode', 'manual');
set(H, 'YLim', get(H,'YLim')); set(H, 'XLim', get(H,'XLim'));
numPoints = n; xg = []; yg = [];
for i=1:numPoints
    [xi yi] = ginput(1);
    xg = [xg xi]; yg = [yg yi];
    if i == 1
        hold on;
        plot(H, xg(i),yg(i),'ro');
        if bText text(xg(i),yg(i),num2str(i),'FontSize',12); end
    else
        plot(xg([i-1:i]),yg([i-1:i]),'r');
        if bText text(xg(i),yg(i),num2str(i),'FontSize',12); end
        if length(xg) > LAG
            robustMA(xg, yg, LAG, DIFF, INFLUENCE, DIRECTION);
        end
    end    
end
hold off;
x = xg; y = yg;
end

function [] = robustMA( x, y, lag, diff, influence, direction)

% robustMA  :: Signal detection algorithm ::
% Author: Jean-Paul van Brakel

% ************************************************************ %
% TO BE USED FOR: *determining significant and sudden changes*
% ************************************************************ %

% x     = x-axis data
% y     = y-axis data
% lag   = lag of moving mean and moving st.dev.
% diff  = number of st.dev. away from the mean in order to give a signal
% influence = number between 0 and 1 that indicates influence of signals
% direction = 'up'/'down'/'both' which means the following:
%               - 'up'  : only signal for deviations ABOVE the mean
%               - 'down': only signal for deviations BELOW the mean
%               - 'both': signal for deviations ABOVE and BELOW the mean

p = y;
outputmean  = tsmovavg(y,'s',lag,2);
outputstdev = movingstd(y,lag,'backward');

newMean  = zeros(1, length(outputmean));
newStdev = zeros(1, length(outputmean));
signals  = ones(1, length(outputmean));

newMean(lag-1)  = outputmean(lag);
newStdev(lag-1) = outputstdev(lag);

for i = lag:length(outputmean)
   if strcmp(direction, 'up')
       if (p(i) > newMean(i-1)+diff*newStdev(i-1))
          newMean(i)  = (newMean(i-1)  + influence*p(i))/(1+influence);
          newStdev(i) = (newStdev(i-1) + influence*sqrt((p(i)-newMean(i-1))^2))/(1+influence);
          signals(i)  = 2;
       else
          newMean(i)  = (newMean(i-1)+p(i))/2;
          newStdev(i) = (newStdev(i-1) + sqrt((p(i)-newMean(i-1))^2))/2; 
          signals(i)  = 1;
       end
   elseif strcmp(direction, 'down')
       if (p(i) < newMean(i-1)-diff*newStdev(i-1))
          newMean(i)  = (newMean(i-1)  + influence*p(i))/(1+influence);
          newStdev(i) = (newStdev(i-1) + influence*sqrt((p(i)-newMean(i-1))^2))/(1+influence);
          signals(i)  = 2;
       else
          newMean(i)  = (newMean(i-1)+p(i))/2;
          newStdev(i) = (newStdev(i-1) + sqrt((p(i)-newMean(i-1))^2))/2; 
          signals(i)  = 1;
       end
   elseif strcmp(direction, 'both')
       if (p(i) > newMean(i-1)+diff*newStdev(i-1) || ...
           p(i) < newMean(i-1)-diff*newStdev(i-1))
          newMean(i)  = (newMean(i-1)  + influence*p(i))/(1+influence);
          newStdev(i) = (newStdev(i-1) + influence*sqrt((p(i)-newMean(i-1))^2))/(1+influence);
          signals(i)  = 2;
       else
          newMean(i)  = (newMean(i-1)+p(i))/2;
          newStdev(i) = (newStdev(i-1) + sqrt((p(i)-newMean(i-1))^2))/2; 
          signals(i)  = 1;
       end
   end
end

figure(1);
subplot(2,2,2);
hold on;
title('Algorithm output');
area(x, newMean+diff*newStdev, 'FaceColor', [0.9 0.9 0.9], 'EdgeColor', 'none');
area(x, newMean, 'FaceColor', [1 1 1], 'EdgeColor', 'none');
area(x, newMean, 'FaceColor', [0.9 0.9 0.9], 'EdgeColor', 'none');
area(x, newMean-diff*newStdev, 'FaceColor', [1 1 1], 'EdgeColor', 'none');
plot(x, p, ':r', 'LineWidth', 1, 'Color', 'black');
plot(x, newMean, 'LineWidth', 2, 'Color', 'red');
plot(x, newMean+newStdev, 'LineWidth', 2, 'Color', 'green');
plot(x, newMean-newStdev, 'LineWidth', 2, 'Color', 'green');
xlim([0 50]);   ylim([0 5])
hold off;
subplot(2,2,4);
hold on;
title('Signal output');
stairs(x, signals, 'LineWidth', 2, 'Color', 'blue');
ylim([0 3]);    xlim([0 50]);
hold off;

end

function s = movingstd(x,k,windowmode)
% movingstd: efficient windowed standard deviation of a time series
% usage: s = movingstd(x,k,windowmode)
%
% Movingstd uses filter to compute the standard deviation, using
% the trick of std = sqrt((sum(x.^2) - n*xbar.^2)/(n-1)).
% Beware that this formula can suffer from numerical problems for
% data which is large in magnitude.

% check for a windowmode
if (nargin<3) || isempty(windowmode)
  % supply the default: 
  windowmode = 'central';
elseif ~ischar(windowmode)
  error 'If supplied, windowmode must be a character flag.'
end
% check for a valid shortening.
valid = {'central' 'forward' 'backward'};
windowmode = lower(windowmode);
ind = strmatch(windowmode,valid);
if isempty(ind)
  error 'Windowmode must be a character flag: ''c'', ''b'', or ''f''.'
else
  windowmode = valid{ind};
end

% length of the time series
n = length(x);

% check for valid k
if (nargin<2) || isempty(k) || (rem(k,1)~=0)
  error 'k was not provided or not an integer.'
end
switch windowmode
  case 'central'
    if k<1
      error 'k must be at least 1 for windowmode = ''central''.'
    end
    if n<(2*k+1)
      error 'k is too large for this short of a series and this windowmode.'
    end
  otherwise
    if k<2
      error 'k must be at least 2 for windowmode = ''forward'' or ''backward''.'
    end
    if (n<k)
      error 'k is too large for this short of a series.'
    end
end

% Improve the numerical analysis by subtracting off the series mean
% this has no effect on the standard deviation.
x = x - mean(x);

% we will need the squared elements 
x2 = x.^2;

% split into the three windowmode cases for simplicity
A = 1;
switch windowmode
  case 'central'
    B = ones(1,2*k+1);
    s = sqrt((filter(B,A,x2) - (filter(B,A,x).^2)*(1/(2*k+1)))/(2*k));
    s(k:(n-k)) = s((2*k):end);
  case 'forward'
    B = ones(1,k);
    s = sqrt((filter(B,A,x2) - (filter(B,A,x).^2)*(1/k))/(k-1));
    s(1:(n-k+1)) = s(k:end);
  case 'backward'
    B = ones(1,k);
    s = sqrt((filter(B,A,x2) - (filter(B,A,x).^2)*(1/k))/(k-1));
end

% special case the ends as appropriate
switch windowmode
  case 'central'
    % repairs are needed at both ends
    for i = 1:k
      s(i) = std(x(1:(k+i)));
      s(n-k+i) = std(x((n-2*k+i):n));
    end
  case 'forward'
    % the last k elements must be repaired
    for i = (k-1):-1:1
      s(n-i+1) = std(x((n-i+1):n));
    end
  case 'backward'
    % the first k elements must be repaired
    for i = 1:(k-1)
      s(i) = std(x(1:i));
    end
end
end

所需要的参数是:

The necessary parameters are:

  • LAG :滞后的移动平均值和移动ST。开发。
  • DIFF :ST数量。开发。离平均值,以产生一个信号
  • 的影响:当有一个信号,有多少是平均/ st.dev。影响? (0-1之间的号码)
  • 方向:当偏差'了'的信号/向下/既远离平均
  • LAG: lag for the moving mean and moving st. dev.
  • DIFF: number of st. dev. away from the mean to generate a signal
  • INFLUENCE: when there is a signal, how much is mean/st.dev. influenced? (number between 0-1)
  • DIRECTION: signal when deviation is 'up'/'down'/'both' away from the mean?

正如你所看到的,我使用的设置 LAG = 10; DIFF = 3.5;影响= 0; 这个演示。随意摆弄这些参数,并研究了算法的性能差异。

As you can see, I used the settings LAG=10; DIFF=3.5; INFLUENCE=0; for this demo. Feel free to fiddle around with these parameters and study the differences in performance of the algorithm.

我已经建造中的的标准偏差值的距离平均值作为阈值一个新的(非常好执行)算法。

I have constructed a new (very well performing) algorithm in which the number of standard deviations away from the mean is used as threshold.

这背后的基本思路如下:

The basic idea behind it is the following:

if (price(t) deviates more than mean(t-1) + k * std(t-1) )
{ 
    // construct signal
    new Mean : mean(t) = (mean(t-1)+influence*price(t))/(1+influence);
    new Std  : std(t)  = (std(t-1)+influence*sqrt((price(t)-newMean(t-1))^2))/(1+influence)
} else
{
    new Mean : mean(t) = (mean(t-1) + price(t)) / 2
    new Std  : std(t) = std(t-1) + sqrt((price(t) - mean(t-1))^2) /2
} 

这小算法执行得非常好!

This small algorithm performs surprisingly well!

特别是,因为它不使用实均值和标准差,但构造一个新的,使得信号不影响/损坏的信号阈值

Especially because it doesn't use the real mean and standard deviation but construct a new one such that signals do not influence/ corrupt the signal threshold.

优点

  • 执行得非常好(无论时间长短)
  • 非常强大:调整的噪音,但仍然客观的向高山峰
  • 只需要 2 输入参数
  • 允许为不同的移动平均线结构(简单,指数,权重等。)
  • Performs surprisingly well (no matter the length of time)
  • Very robust: adjusts for noise but still remains objective towards high peaks
  • Only needs 2 input parameters
  • Allows for different moving average structures (Simple, Exponential, Weighted etc.)

缺点

  • 需要估计2输入参数,需要指定的影响参数
  • 对于噪音小将军的变化,该算法不是非常有用

有关样本数据的例子:

滞后= 30,差异= 3,影响力= 0

滞后= 30,差异= 3.5,影响力= 0

滞后= 30,差异= 5,影响力= 0

$ C $下复制在Matlab:

Code for replicating in Matlab:

p = [1 1 1.1 1 0.9 1 1 1.1 1 0.9 1 1.1 1 1 0.9 1 1 1.1 1 1,...
    1 1 1.1 0.9 1 1.1 1 1 0.9 1 1.1 1 1 1.1 1 0.8 0.9 1 1.2 0.9 1,...
    1 1.1 1.2 1 1.5 1 3 2 5 3 2 1 1 1 0.9 1,...
    1 3 2.6 4 3 3.2 2 1 1 0.8 4 4 2 2.5 1 1 1];

% SPECS
lag = 30;       % lag of the algorithm
diff = 3.5;     % number of standard deviations from the mean
influence = 0;  % influence parameter (convergence to signals)
% SPECS

outputmean  = tsmovavg(p,'s',lag,2);
outputstdev = movingstd(p,lag,'backward');

newMean  = zeros(1, length(outputmean));
newStdev = zeros(1, length(outputmean));
signals  = zeros(1, length(outputmean));

newMean(lag-1)  = outputmean(lag);
newStdev(lag-1) = outputstdev(lag);

for i=lag:length(outputmean)
   if (p(i) > newMean(i-1)+diff*newStdev(i-1))
      newMean(i)  = (newMean(i-1)  + influence*p(i))/(1+influence);
      newStdev(i) = (newStdev(i-1) + influence*sqrt((p(i)-newMean(i-1))^2))/(1+influence);
      signals(i)  = 0.3;
   else
      newMean(i)  = (newMean(i-1)+p(i))/2;
      newStdev(i) = (newStdev(i-1) + sqrt((p(i)-newMean(i-1))^2))/2; 
      signals(i)  = 0;
   end
end

figure;
hold all;
plot(p, ':r', 'LineWidth', 1, 'Color', 'black');
plot(signals, 'LineWidth', 2, 'Color', 'blue');
plot(newMean, 'LineWidth', 2, 'Color', 'red');
plot(newMean+newStdev, 'LineWidth', 2, 'Color', 'green');

  • 附加功能 movingstd :<一href="http://www.mathworks.com/matlabcentral/fileexchange/9428-moving-window-standard-deviation/content/movingstd.m"相对=nofollow> code
    • Extra function movingstd: code
    • 这篇关于在实时时间序列数据的峰值信号检测的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆