使用 Python 捕获 TCP 数据包 [英] Capture TCP-Packets with Python
问题描述
我尝试使用 dpkt 和 pcap 通过 Python 捕获 HTTP 下载.代码看起来像
I try to capture an HTTP-download with Python using dpkt and pcap. The code looks like
...
pc = pcap.pcap(iface)
for ts, pkt in pc:
handle_packet(pkt)
def handle_packet(pkt):
eth = dpkt.ethernet.Ethernet(pkt)
# Ignore non-IP and non-TCP packets
if eth.type != dpkt.ethernet.ETH_TYPE_IP:
return
ip = eth.data
if ip.p != dpkt.ip.IP_PROTO_TCP:
return
tcp = ip.data
data = tcp.data
# current connection
c = (ip.src, ip.dst, tcp.sport, tcp.dport)
# Handle only new HTTP-responses and TCP-packets
# of existing connections.
if c in conn:
handle_tcp_packet(c, tcp)
elif data[:4] == 'HTTP':
handle_http_response(c, tcp)
...
在 handle_http_response()
和 handle_tcp_packet()
中,我读取了 tcp 数据包 (tcp.data
) 的数据并将它们写入一份文件.但是我注意到我经常收到具有相同 TCP 序列号 (tcp.seq
)(在同一连接上)的数据包,但它们似乎包含相同的数据.此外,似乎并非所有数据包都被捕获.例如,如果我总结了数据包大小,则结果值低于 http-header (content-length
) 中列出的值.但在 Wireshark 中,我可以看到所有包.
In handle_http_response()
and handle_tcp_packet()
i read the data of the tcp-packets (tcp.data
) and write them to a file. However i noticed that i often get packets with the same TCP sequence number (tcp.seq
) (on the same connection) but it seems that they contain the same data. Moreover it seems that not all packets are captured. For example if i sum up the packet-sizes the resulting value is lower than the one listed in the http-header (content-length
). But in Wireshark i can see all packages.
有谁知道为什么我会得到那些重复的数据包,以及我如何捕获属于 http-response 的每个数据包?
Does anyone has an idea why i get those duplicate packets and how i can capture every packet belonging to the http-response?
在这里您可以找到完整的代码:pastebin.com.运行时,它会将类似的内容打印到标准输出:
Here you can find the complete code: pastebin.com.
When running it prints something like that to stdout:
Waiting for HTTP-Audio-responses ...
...
New TCP-Packet, len=1440, tcp-payload=5107680, con-len=5197150 , dups=57 , dup-bytes=82080
New TCP-Packet, len=1440, tcp-payload=5109120, con-len=5197150 , dups=57 , dup-bytes=82080
New TCP-Packet, len=1440, tcp-payload=5110560, con-len=5197150 , dups=57 , dup-bytes=82080
----------> FIN <----------
New TCP-Packet, len=1937, tcp-payload=5112497, con-len=5197150 , dups=57 , dup-bytes=82080
New TCP-Packet, len=0, tcp-payload=5112497, con-len=5197150 , dups=57 , dup-bytes=82080
如您所见,TCP 负载加上重复接收的字节 (5112497+82080=5194577) 小于下载的文件大小 (5197150).此外,您可以看到我收到了 57 个重复的包(相同的 SEQ 和相同的 TCP 数据),并且在带有 FIN 标志的数据包之后仍然收到了包.
As you can see the TCP-payload plus the duplicate received bytes (5112497+82080=5194577) are lower than the filesize of the download (5197150). Moreover you can see that i receive 57 duplicate packages (same SEQ and same TCP-data) and that still packages are received after the packet with the FIN-flag.
那么有谁知道如何捕获属于该连接的所有数据包?Wireshark 可以看到所有数据包,我认为它也使用 libpcap.
So does anyone have an idea how i can capture all packets belonging to the connection? Wireshark sees all packets and i think it uses libpcap too.
我什至不知道是我做错了什么还是 pcap-library 做错了什么.
I don't even know if i do something wrong or if the pcap-library does something wrong.
好的,我的代码似乎是正确的:在 Wireshark 中,我保存了捕获的数据包并在我的代码中使用了捕获文件 (pcap.pcap('/home/path/filename')
而不是 <代码>pcap.pcap('eth0')).我的代码完美地读取了所有包(在多次测试中)!由于 Wireshark 也使用 libpcap (afaik),我认为问题在于 lib pypcap 没有为我提供所有软件包.
OK, it seems that my code is correct: In Wireshark I saved the captured packets and used the capture-file in my code (pcap.pcap('/home/path/filename')
instead of pcap.pcap('eth0')
). My code read perfectly all packages (on multiple tests)! Since Wireshark uses libpcap too (afaik), i think the problem is the lib pypcap which does not provide me all packages.
知道如何测试吗?
我已经自己编译了 pypcap(主干)但这并没有改变任何东西 -.-
I already compiled pypcap by myself (trunk) but that didn't change anything -.-
好的,我更改了我的代码以使用 pcapy 而不是 pypcap 并且遇到了同样的问题:
从先前捕获的文件(使用 Wireshark 创建)读取数据包时,一切正常,但是当我直接从 eth0 捕获数据包时,我错过了一些数据包.
OK, I changed my code to work with pcapy instead of pypcap and have the same problem:
When reading the packets from a previous captured file (created with Wireshark) then everything is fine, but when I capture the packets directly from eth0 I miss some packets.
有趣:当两个程序(一个使用 pypcap 和一个使用 pcapy)并行运行时,它们捕获不同的数据包.例如一个程序多接收一个数据包.
Interesting: When running both programs (the one using pypcap and the one using pcapy) in parallel they capture different packets. e.g. one programm receives one packet more.
但我还是不知道为什么-.-
我认为 Wireshark 使用相同的 base-lib (libpcap).
But I have still no idea why -.-
I thought Wireshark uses the same base-lib (libpcap).
请帮忙:)
推荐答案
需要注意以下几点:
- 确保你有一个很大的 snaplen - 对于 pcapy,你可以在 open_live(第二个参数)上设置它
- 确保您处理碎片数据包 - 这不会自动完成 - 您需要检查详细信息
- 检查统计信息 - 不幸的是,我认为这不会暴露给 pcapy 接口,但您可能没有处理所有数据包;如果你太晚了,你不会知道你错过了什么(虽然你可以通过跟踪 tcp 流的长度/位置来获得相同的信息)libpcap 本身确实公开了这些统计信息,所以你可以为它添加函数
这篇关于使用 Python 捕获 TCP 数据包的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!