在线检测一般时间序列的异常值的简单算法 [英] Simple algorithm for online outlier detection of a generic time series

查看:916
本文介绍了在线检测一般时间序列的异常值的简单算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理大量时间序列. 这些时间序列基本上是每10分钟进行一次网络测量,其中一些是周期性的(即带宽),而另一些则不是(即路由流量).

I am working with a large amount of time series. These time series are basically network measurements coming every 10 minutes, and some of them are periodic (i.e. the bandwidth), while some other aren't (i.e. the amount of routing traffic).

我想要一种用于在线异常值检测"的简单算法.基本上,我想将每个时间序列的全部历史数据保存在内存中(或保存在磁盘上),并且我想检测实时场景中的任何异常值(每次捕获一个新样本). 实现这些结果的最佳方法是什么?

I would like a simple algorithm for doing an online "outlier detection". Basically, I want to keep in memory (or on disk) the whole historical data for each time series, and I want to detect any outlier in a live scenario (each time a new sample is captured). What is the best way to achieve these results?

我目前正在使用移动平均线来消除一些噪音,但是接下来该怎么办?对整个数据集而言,诸如标准差,疯狂……之类的简单事情无法很好地工作(我不能假设时间序列是固定的),我想要更准确"的东西,理想情况下是一个黑匣子,例如:

I'm currently using a moving average in order to remove some noise, but then what next? Simple things like standard deviation, mad, ... against the whole data set doesn't work well (I can't assume the time series are stationary), and I would like something more "accurate", ideally a black box like:

double outlier_detection(double* vector, double value);

其中vector是包含历史数据的double数组,返回值是新样本"value"的异常得分.

where vector is the array of double containing the historical data, and the return value is the anomaly score for the new sample "value" .

推荐答案

这是一个大而复杂的主题,答案取决于(a)您要为此付出多少努力,以及(b)您是否有效希望您能检测到异常值.一种可能的方法是自适应滤波,通常用于降噪耳机等应用.具有一个不断适应输入信号的滤波器,有效地使其滤波器系数与信号源的假设短期模型匹配,从而降低了均方误差输出.然后,当您出现异常值时,会为您提供低电平输出信号(残留误差) ,这会导致尖峰,并且很容易检测到(阈值).在自适应过滤

This is a big and complex subject, and the answer will depend on (a) how much effort you want to invest in this and (b) how effective you want your outlier detection to be. One possible approach is adaptive filtering, which is typically used for applications like noise cancelling headphones, etc. You have a filter which constantly adapts to the input signal, effectively matching its filter coefficients to a hypothetical short term model of the signal source, thereby reducing mean square error output. This then gives you a low level output signal (the residual error) except for when you get an outlier, which will result in a spike, which will be easy to detect (threshold). Read up on adaptive filtering, LMS filters, etc, if you're serious about this kind of technique.

这篇关于在线检测一般时间序列的异常值的简单算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆