使用 Python 捕获 TCP 数据包 [英] Capture TCP-Packets with Python

查看:241
本文介绍了使用 Python 捕获 TCP 数据包的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试使用 dpkt 和 pcap 通过 Python 捕获 HTTP 下载.代码看起来像

I try to capture an HTTP-download with Python using dpkt and pcap. The code looks like

...
pc = pcap.pcap(iface)
for ts, pkt in pc:
    handle_packet(pkt)

def handle_packet(pkt):
    eth = dpkt.ethernet.Ethernet(pkt)

    # Ignore non-IP and non-TCP packets
    if eth.type != dpkt.ethernet.ETH_TYPE_IP:
        return
    ip = eth.data
    if ip.p != dpkt.ip.IP_PROTO_TCP:
        return

    tcp = ip.data
    data = tcp.data

    # current connection
    c = (ip.src, ip.dst, tcp.sport, tcp.dport)

    # Handle only new HTTP-responses and TCP-packets
    # of existing connections.
    if c in conn:
        handle_tcp_packet(c, tcp)
    elif data[:4] == 'HTTP':
        handle_http_response(c, tcp)
...

handle_http_response()handle_tcp_packet() 中,我读取了 tcp 数据包 (tcp.data) 的数据并将它们写入一份文件.但是我注意到我经常收到具有相同 TCP 序列号 (tcp.seq)(在同一连接上)的数据包,但它们似乎包含相同的数据.此外,似乎并非所有数据包都被捕获.例如,如果我总结了数据包大小,则结果值低于 http-header (content-length) 中列出的值.但在 Wireshark 中,我可以看到所有包.

In handle_http_response() and handle_tcp_packet() i read the data of the tcp-packets (tcp.data) and write them to a file. However i noticed that i often get packets with the same TCP sequence number (tcp.seq) (on the same connection) but it seems that they contain the same data. Moreover it seems that not all packets are captured. For example if i sum up the packet-sizes the resulting value is lower than the one listed in the http-header (content-length). But in Wireshark i can see all packages.

有谁知道为什么我会得到那些重复的数据包,以及我如何捕获属于 http-response 的每个数据包?

Does anyone has an idea why i get those duplicate packets and how i can capture every packet belonging to the http-response?


在这里您可以找到完整的代码:pastebin.com.运行时,它会将类似的内容打印到标准输出:


Here you can find the complete code: pastebin.com. When running it prints something like that to stdout:

Waiting for HTTP-Audio-responses ...
...
New TCP-Packet, len=1440, tcp-payload=5107680, con-len=5197150 , dups=57 , dup-bytes=82080
New TCP-Packet, len=1440, tcp-payload=5109120, con-len=5197150 , dups=57 , dup-bytes=82080
New TCP-Packet, len=1440, tcp-payload=5110560, con-len=5197150 , dups=57 , dup-bytes=82080
----------> FIN <----------
New TCP-Packet, len=1937, tcp-payload=5112497, con-len=5197150 , dups=57 , dup-bytes=82080
New TCP-Packet, len=0, tcp-payload=5112497, con-len=5197150 , dups=57 , dup-bytes=82080

如您所见,TCP 负载加上重复接收的字节 (5112497+82080=5194577) 小于下载的文件大小 (5197150).此外,您可以看到我收到了 57 个重复的包(相同的 SEQ 和相同的 TCP 数据),并且在带有 FIN 标志的数据包之后仍然收到了包.

As you can see the TCP-payload plus the duplicate received bytes (5112497+82080=5194577) are lower than the filesize of the download (5197150). Moreover you can see that i receive 57 duplicate packages (same SEQ and same TCP-data) and that still packages are received after the packet with the FIN-flag.

那么有谁知道如何捕获属于该连接的所有数据包?Wireshark 可以看到所有数据包,我认为它也使用 libpcap.

So does anyone have an idea how i can capture all packets belonging to the connection? Wireshark sees all packets and i think it uses libpcap too.

我什至不知道是我做错了什么还是 pcap-library 做错了什么.

I don't even know if i do something wrong or if the pcap-library does something wrong.


好的,我的代码似乎是正确的:在 Wireshark 中,我保存了捕获的数据包并在我的代码中使用了捕获文件 (pcap.pcap('/home/path/filename') 而不是 <代码>pcap.pcap('eth0')).我的代码完美地读取了所有包(在多次测试中)!由于 Wireshark 也使用 libpcap (afaik),我认为问题在于 lib pypcap 没有为我提供所有软件包.


OK, it seems that my code is correct: In Wireshark I saved the captured packets and used the capture-file in my code (pcap.pcap('/home/path/filename') instead of pcap.pcap('eth0')). My code read perfectly all packages (on multiple tests)! Since Wireshark uses libpcap too (afaik), i think the problem is the lib pypcap which does not provide me all packages.

知道如何测试吗?

我已经自己编译了 pypcap(主干)但这并没有改变任何东西 -.-

I already compiled pypcap by myself (trunk) but that didn't change anything -.-


好的,我更改了我的代码以使用 pcapy 而不是 pypcap 并且遇到了同样的问题:
从先前捕获的文件(使用 Wireshark 创建)读取数据包时,一切正常,但是当我直接从 eth0 捕获数据包时,我错过了一些数据包.


OK, I changed my code to work with pcapy instead of pypcap and have the same problem:
When reading the packets from a previous captured file (created with Wireshark) then everything is fine, but when I capture the packets directly from eth0 I miss some packets.

有趣:当两个程序(一个使用 pypcap 和一个使用 pcapy)并行运行时,它们捕获不同的数据包.例如一个程序多接收一个数据包.

Interesting: When running both programs (the one using pypcap and the one using pcapy) in parallel they capture different packets. e.g. one programm receives one packet more.

但我还是不知道为什么-.-
我认为 Wireshark 使用相同的 base-lib (libpcap).

But I have still no idea why -.-
I thought Wireshark uses the same base-lib (libpcap).

请帮忙:)

推荐答案

需要注意以下几点:

  • 确保你有一个很大的 snaplen - 对于 pcapy,你可以在 open_live(第二个参数)上设置它
  • 确保您处理碎片数据包 - 这不会自动完成 - 您需要检查详细信息
  • 检查统计信息 - 不幸的是,我认为这不会暴露给 pcapy 接口,但您可能没有处理所有数据包;如果你太晚了,你不会知道你错过了什么(虽然你可以通过跟踪 tcp 流的长度/位置来获得相同的信息)libpcap 本身确实公开了这些统计信息,所以你可以为它添加函数

这篇关于使用 Python 捕获 TCP 数据包的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆