如何从DARPA pcap文件导出KDD99功能? [英] How to derive KDD99 Features from DARPA pcap file?

查看:438
本文介绍了如何从DARPA pcap文件导出KDD99功能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近使用了DARPA网络流量数据包,并将其派生版本用于KDD99中以进行入侵检测评估.

由于我在计算机网络领域的知识有限,我只能从DARPA数据包头中获得9个功能.而不是KDD99中使用的41个功能.

我打算继续进行有关UNB ISCX入侵检测评估数据集的工作.但是,我想从pcap文件中导出KDD99中使用的41个功能并将其保存为CSV格式.是否有快速/简便的方法来实现这一目标?

解决方案

请谨慎使用此数据集.

http://www.kdnuggets.com/news/2007/n18/4i.html

一些摘录:

人工数据是使用封闭网络,一些专有网络流量生成器和手动注入攻击生成的

在提出的问题中,最重要的似乎是从未进行过任何验证以表明DARPA数据集实际上看起来像真实的网络流量.

2003年,Mahoney和Chan建立了一个微不足道的入侵检测系统,并针对DARPA tcpdump数据运行了该系统.他们发现了许多违规行为,其中包括-由于生成数据的方式-所有恶意数据包的TTL为126或253,而几乎所有良性数据包的TTL为127或254.

DARPA数据集(并扩展为KDD Cup '99数据集)从根本上被打破了,使用该数据集进行的任何实验都无法得出任何结论

我们强烈建议(1)所有研究人员停止使用KDD Cup '99数据集

关于所使用的特征提取. IIRC的大多数功能只是解析的 IP/TCP/UDP标头的属性.例如,端口号,IP的最后一个八位位组和一些数据包标志.

因此,这些发现不再能反映出现实攻击.与创建数据集时相比,今天的TCP/IP堆栈更强大,在这种情况下,死亡之ping"会立即锁定Windows主机.现在,每个TCP/IP堆栈开发人员都应该意识到这种格式错误的数据包的风险,并针对这种情况对堆栈进行压力测试.

因此,这些功能已变得毫无意义.错误设置的SYN标志等将不再用于网络攻击;这些要复杂得多;并且很可能不再攻击TCP/IP堆栈,而是运行在下一层的服务.因此,我不会费心找出在90年代有缺陷的模拟中使用了哪些低级数据包标志,这些模拟使用了90年代初的攻击方式.

I have worked recently with the DARPA network traffic packets and the derived version of it used in KDD99 for intrusion detection evaluation.

Excuse my limited domain knowledge in computer networks, I could only derive 9 features from the DARPA packet headers. and Not the 41 features used in KDD99.

I am intending to continue my work on the UNB ISCX Intrusion Detection Evaluation DataSet. However, I want to derive from the pcap files the 41 features used in the KDD99 and save it in a CSV format. Is there a fast/easy way to achieve this?

解决方案

Be careful with this data set.

http://www.kdnuggets.com/news/2007/n18/4i.html

Some excerpts:

the artificial data was generated using a closed network, some proprietary network traffic generators, and hand-injected attacks

Among the issues raised, the most important seemed to be that no validation was ever performed to show that the DARPA dataset actually looked like real network traffic.

In 2003, Mahoney and Chan built a trivial intrusion detection system and ran it against the DARPA tcpdump data. They found numerous irregularities, including that -- due to the way the data was generated -- all the malicious packets had a TTL of 126 or 253 whereas almost all the benign packets had a TTL of 127 or 254.

the DARPA dataset (and by extension, the KDD Cup '99 dataset) was fundamentally broken, and one could not draw any conclusions from any experiments run using them

we strongly recommend that (1) all researchers stop using the KDD Cup '99 dataset

As for the feature extraction used. IIRC the majority of features simply were attributes of the parsed IP/TCP/UDP headers. Such as, port number, last octet of IP, and some packet flags.

As such, these findings no longer reflect realistic attacks anymore anyway. Todays TCP/IP stacks are much more robust than at the time the data set was created, where a "ping of death" would instantly lock up a windows host. Every developer of a TCP/IP stack should by now be aware of the risk of such malformed packets and stress-test the stack against such things.

With this, these features have become pretty much meaningless. Incorrectly set SYN flags etc. are no longer used in network attacks; these are much more sophisticated; and most likely no longer attacking the TCP/IP stack, but the services running on the next layer. So I would not bother finding out which low level packet flags were used in that '99 flawed simulation using attacks that worked in the early '90s...

这篇关于如何从DARPA pcap文件导出KDD99功能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆