通用时间序列在线异常值检测的简单算法 [英] Simple algorithm for online outlier detection of a generic time series

查看:32
本文介绍了通用时间序列在线异常值检测的简单算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理大量时间序列.这些时间序列基本上是每 10 分钟进行一次的网络测量,其中一些是周期性的(即带宽),而另一些则不是(即路由流量).

I am working with a large amount of time series. These time series are basically network measurements coming every 10 minutes, and some of them are periodic (i.e. the bandwidth), while some other aren't (i.e. the amount of routing traffic).

我想要一个简单的算法来进行在线异常值检测".基本上,我想将每个时间序列的整个历史数据保存在内存中(或磁盘上),并且我想检测实时场景中的任何异常值(每次捕获新样本时).实现这些结果的最佳方法是什么?

I would like a simple algorithm for doing an online "outlier detection". Basically, I want to keep in memory (or on disk) the whole historical data for each time series, and I want to detect any outlier in a live scenario (each time a new sample is captured). What is the best way to achieve these results?

我目前正在使用移动平均线来消除一些噪音,但接下来呢?像标准偏差,疯狂,......对整个数据集的简单事情效果不佳(我不能假设时间序列是固定的),我想要更准确"的东西,理想情况下是一个黑盒子,如:

I'm currently using a moving average in order to remove some noise, but then what next? Simple things like standard deviation, mad, ... against the whole data set doesn't work well (I can't assume the time series are stationary), and I would like something more "accurate", ideally a black box like:

double outlier_detection(double* vector, double value);

其中vector是包含历史数据的double数组,返回值是新样本值"的异常分数.

where vector is the array of double containing the historical data, and the return value is the anomaly score for the new sample "value" .

推荐答案

这是一个大而复杂的主题,答案将取决于 (a) 你想在这方面投入多少努力以及 (b) 你的效率如何希望您的异常值检测成为.一种可能的方法是自适应过滤,它通常用于降噪耳机等应用.您具有不断适应输入信号的滤波器,有效地将其滤波器系数与信号源的假设短期模型相匹配,从而减少均方误差输出.这会给你一个低电平的输出信号(残差)除了,当你得到一个异常值时,这将导致一个尖峰,这很容易检测到(阈值).阅读 自适应过滤, LMS 过滤器等,如果你对这种技术很认真的话.

This is a big and complex subject, and the answer will depend on (a) how much effort you want to invest in this and (b) how effective you want your outlier detection to be. One possible approach is adaptive filtering, which is typically used for applications like noise cancelling headphones, etc. You have a filter which constantly adapts to the input signal, effectively matching its filter coefficients to a hypothetical short term model of the signal source, thereby reducing mean square error output. This then gives you a low level output signal (the residual error) except for when you get an outlier, which will result in a spike, which will be easy to detect (threshold). Read up on adaptive filtering, LMS filters, etc, if you're serious about this kind of technique.

这篇关于通用时间序列在线异常值检测的简单算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆