算法(S)为察觉异常("尖峰")在交通数据 [英] Algorithm(s) for spotting anomalies ("spikes") in traffic data

查看:237
本文介绍了算法(S)为察觉异常("尖峰")在交通数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我觉得自己需要处理与 tcpdump的捕获网络流量。读交通并不难,但什么变得有点棘手的是点状出血,其中有尖峰的交通。我最关心的是TCP SYN数据包,我想要做的就是找​​到天,其中有一个突然上升的流量对于给定的目标端口。有相当多的数据来处理(约一年)。

I find myself needing to process network traffic captured with tcpdump. Reading the traffic is not hard, but what gets a bit tricky is spotting where there are "spikes" in the traffic. I'm mostly concerned with TCP SYN packets and what I want to do is find days where there's a sudden rise in the traffic for a given destination port. There's quite a bit of data to process (roughly one year).

我到目前为止已经试过是使用指数移动平均线,这是不够好,让我得到了一些有趣的措施出来,但对比我所看到的与外部数据源似乎有点过于激进的标记物异常。

What I've tried so far is to use an exponential moving average, this was good enough to let me get some interesting measures out, but comparing what I've seen with external data sources seems to be a bit too aggressive in flagging things as abnormal.

我已经使用指数移动平均加上历史数据的组合(可能从7天过去,认为应该有一个每周一次的周期来我所看到)认为,一些论文我读过似乎已经成功地资源使用这种方式具有良好的成功模式。

I've considered using a combination of the exponential moving average plus historical data (possibly from 7 days in the past, thinking that there ought to be a weekly cycle to what I am seeing), as some papers I've read seem to have managed to model resource usage that way with good success.

那么,有没有人知道一个很好的方法,或者其他什么地方去,对这样的事情读了起来。

So, does anyone knows of a good method or somewhere to go and read up on this sort of thing.

移动平均我一直在使用大体类似:

The moving average I've been using looks roughly like:

avg = avg+0.96*(new-avg)

使用平均作为EMA和作为新措施。我一直用什么阈值使用的试验,却发现相结合的必须比以前平均一个给定的系数较高的称重中的新价值和必须至少有3高给予最不坏的结果。

With avg being the EMA and new being the new measure. I have been experimenting with what thresholds to use, but found that a combination of "must be a given factor higher than the average prior to weighing the new value in" and "must be at least 3 higher" to give the least bad result.

推荐答案

这是广泛研究的入侵检测文献。这是在这个问题上,显示,除其他事项外,如何分析tcpdump的数据,以获得相关的见解一个开创性的论文。

This is widely studied in intrusion detection literature. This is a seminal paper on the issue which shows, among other things, how to analyze tcpdump data to gain relevant insights.

这是纸:<一href="http://www.usenix.org/publications/library/proceedings/sec98/full_papers/full_papers/lee/lee_html/lee.html" rel="nofollow">http://www.usenix.org/publications/library/proceedings/sec98/full_papers/full_papers/lee/lee_html/lee.html在这里,他们用开膛规则归纳系统,我想你可以替换旧的东西,新的如的http:// www.newty.de/pnc2/ HTTP://www.data-miner。 COM / rik.html

This is the paper: http://www.usenix.org/publications/library/proceedings/sec98/full_papers/full_papers/lee/lee_html/lee.html here they use the RIPPER rule induction system, I guess you could replace that old one for something newer such as http://www.newty.de/pnc2/ or http://www.data-miner.com/rik.html

这篇关于算法(S)为察觉异常(&QUOT;尖峰&QUOT;)在交通数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆