如何从 DARPA pcap 文件中导出 KDD99 功能? [英] How to derive KDD99 Features from DARPA pcap file?

查看:37
本文介绍了如何从 DARPA pcap 文件中导出 KDD99 功能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近使用 DARPA 网络流量数据包及其派生版本在 KDD99 中用于入侵检测评估.

I have worked recently with the DARPA network traffic packets and the derived version of it used in KDD99 for intrusion detection evaluation.

请原谅我有限的计算机网络领域知识,我只能从 DARPA 数据包头中导出 9 个特征.而不是 KDD99 中使用的 41 个功能.

Excuse my limited domain knowledge in computer networks, I could only derive 9 features from the DARPA packet headers. and Not the 41 features used in KDD99.

我打算继续我的 UNB ISCX 入侵检测评估数据集的工作.但是,我想从 pcap 文件中导出 KDD99 中使用的 41 个功能并将其保存为 CSV 格式.有没有快速/简单的方法来实现这一目标?

I am intending to continue my work on the UNB ISCX Intrusion Detection Evaluation DataSet. However, I want to derive from the pcap files the 41 features used in the KDD99 and save it in a CSV format. Is there a fast/easy way to achieve this?

推荐答案

小心这个数据集.

http://www.kdnuggets.com/news/2007/n18/4i.html

部分摘录:

人工数据是使用封闭网络、一些专有网络流量生成器和手动注入攻击生成的

the artificial data was generated using a closed network, some proprietary network traffic generators, and hand-injected attacks

在提出的问题中,最重要的似乎是没有进行任何验证以表明 DARPA 数据集实际上看起来像真实的网络流量.

Among the issues raised, the most important seemed to be that no validation was ever performed to show that the DARPA dataset actually looked like real network traffic.

2003 年,Mahoney 和 Chan 构建了一个简单的入侵检测系统,并针对 DARPA tcpdump 数据运行它.他们发现了许多不规则之处,包括——由于数据的生成方式——所有恶意数据包的 TTL 为 126 或 253,而几乎所有良性数据包的 TTL 为 127 或 254.

In 2003, Mahoney and Chan built a trivial intrusion detection system and ran it against the DARPA tcpdump data. They found numerous irregularities, including that -- due to the way the data was generated -- all the malicious packets had a TTL of 126 or 253 whereas almost all the benign packets had a TTL of 127 or 254.

DARPA 数据集(以及 KDD Cup '99 数据集)从根本上被破坏,人们无法从使用它们运行的​​任何实验中得出任何结论

我们强烈建议 (1) 所有研究人员停止使用 KDD Cup '99 数据集

we strongly recommend that (1) all researchers stop using the KDD Cup '99 dataset

至于使用的特征提取.IIRC 的大多数功能只是解析 IP/TCP/UDP 标头的属性.例如,端口号、IP 的最后一个八位字节和一些数据包标志.

As for the feature extraction used. IIRC the majority of features simply were attributes of the parsed IP/TCP/UDP headers. Such as, port number, last octet of IP, and some packet flags.

因此,无论如何,这些发现不再反映现实的攻击.今天的 TCP/IP 堆栈比创建数据集时健壮得多,在那里死亡 ping"会发生.会立即锁定 Windows 主机.TCP/IP 堆栈的每个开发人员现在都应该意识到此类畸形数据包的风险,并针对此类情况对堆栈进行压力测试.

As such, these findings no longer reflect realistic attacks anymore anyway. Todays TCP/IP stacks are much more robust than at the time the data set was created, where a "ping of death" would instantly lock up a windows host. Every developer of a TCP/IP stack should by now be aware of the risk of such malformed packets and stress-test the stack against such things.

这样,这些功能变得毫无意义.网络攻击中不再使用错误设置的 SYN 标志等;这些要复杂得多;并且很可能不再攻击 TCP/IP 堆栈,而是攻击在下一层运行的服务.因此,我不会费心使用 90 年代初有效的攻击来找出在 99 年有缺陷的模拟中使用了哪些低级别数据包标志......

With this, these features have become pretty much meaningless. Incorrectly set SYN flags etc. are no longer used in network attacks; these are much more sophisticated; and most likely no longer attacking the TCP/IP stack, but the services running on the next layer. So I would not bother finding out which low level packet flags were used in that '99 flawed simulation using attacks that worked in the early '90s...

这篇关于如何从 DARPA pcap 文件中导出 KDD99 功能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆