使用Python解压缩数据包的压缩后的有效载荷 [英] Decompressing a gzipped payload of a packet with Python

查看:195
本文介绍了使用Python解压缩数据包的压缩后的有效载荷的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在开发一个程序,该程序需要一个.pcap文件,并使用scapy软件包通过ip将所有数据包分离出来.我想解压缩使用gzip包压缩的有效负载.我可以判断是否将有效载荷压缩了,因为它包含

I am currently working on a program that takes a .pcap file and separates all of the packets out by ip using the scapy package. I want to decompress the payloads that are compressed using the gzip package. I can tell if the payload is gzipped because it contains

Content-Encoding: gzip

我正在尝试使用

fileStream = StringIO.StringIO(payload)
gzipper = gzip.GzipFile(fileobj=fileStream)
data = gzipper.read()

解压缩有效负载,其中

payload = str(pkt[TCP].payload)

当我尝试执行此操作时,我会收到错误消息

When I try to do this I get the error

IOError: Not a gzipped file

当我打印第一个有效载荷时

When I print the first payload I get

HTTP/1.1 200 OK
Cache-Control: private, max-age=0
Content-Type: text/html; charset=utf-8
P3P: CP="NON UNI COM NAV STA LOC CURa DEVa PSAa PSDa OUR IND"
Vary: Accept-Encoding
Content-Encoding: gzip
Date: Sat, 30 Mar 2013 19:23:33 GMT
Content-Length: 15534
Connection: keep-alive
Set-Cookie: _FS=NU=1; domain=.bing.com; path=/
Set-Cookie: _SS=SID=F2652FD33DC443498CE043186458C3FC&C=20.0; domain=.bing.com; path=/
Set-Cookie: MUID=2961778241736E4F314E732240626EBE; expires=Mon, 30-Mar-2015 19:23:33 GMT; domain=.bing.com; path=/
Set-Cookie: MUIDB=2961778241736E4F314E732240626EBE; expires=Mon, 30-Mar-2015 19:23:33 GMT; path=/
Set-Cookie: OrigMUID=2961778241736E4F314E732240626EBE%2c532012b954b64747ae9b83e7ede66522; expires=Mon, 30-Mar-2015 19:23:33 GMT; domain=.bing.com; path=/
Set-Cookie: SRCHD=D=2758763&MS=2758763&AF=NOFORM; expires=Mon, 30-Mar-2015 19:23:33 GMT; domain=.bing.com; path=/
Set-Cookie: SRCHUID=V=2&GUID=02F43275DC7F435BB3DF3FD32E181F4D; expires=Mon, 30-Mar-2015 19:23:33 GMT; path=/
Set-Cookie: SRCHUSR=AUTOREDIR=0&GEOVAR=&DOB=20130330; expires=Mon, 30-Mar-2015 19:23:33 GMT; domain=.bing.com; path=/

?}k{?H????+0?#!?,_???$?:?7vf?w?Hb???ƊG???9???/9U?\$;3{9g?ycAӗ???????W{?o?~?FZ?e ]>??<??n????׻?????????d?t??a?3?
?2?p??eBI?e??????ܒ?P??-?Q?-L?????ǼR?³?ׯ??%'
?2Kf?7???c?Y?I?1+c??,ae]?????<{?=ƞ,?^?J?ď???y??6O?_?z????_?ޞ~?_?????Bo%]???_?????W=?

有关其他信息,这是一个隔离的数据包,因为它包含项目提供的示例.pcap文件中的Content-Encoding:gzip.

For additional information, this is a packet that was isolated because it contained Content-Encoding: gzip from a sample .pcap file provided by a project.

推荐答案

要解码gzip压缩的HTTP响应,您只需要解码响应 body ,而无需解码标头.

In order to decode a gzipped HTTP response, you only need to decode the response body, not the headers.

在您的情况下,payload是整个TCP有效负载,即包括头和正文的整个HTTP消息.

The payload in your case is the entire TCP payload, i.e. the entire HTTP message including headers and body.

HTTP消息(请求和响应)是 RFC 822 消息(相同电子邮件(RFC 2822)所基于的通用消息格式.

HTTP messages (requests and responses) are RFC 822 messages (which is the same generic message format that E-Mail messages (RFC 2822) are based upon).

822消息的结构非常简单:

The structure of an 822 message is very simple:

  • 零个或多个标题行(由:分隔的键/值对),以CRLF终止
  • 空行(CRLF(回车,换行,所以'\r\n')
  • 邮件正文
  • Zero or more header lines (key/ value pairs separated by :), terminated by CRLF
  • An empty line (CRLF (carriage return, line feed, so '\r\n')
  • The message body

您现在可以自己解析此消息,以隔离尸体.但是,我宁愿建议您使用Python已经为您提供的工具. httplib 模块(Python 2.x)包括 HTTPMessage 类,由httplib在内部用于解析HTTP响应.它不打算直接使用,但是在这种情况下,我可能仍会使用它-它会为您处理一些HTTP特定的详细信息.

You now could parse this message yourself in order to isolate the body. But I would rather recommend you use the tools Python already provides for you. The httplib module (Python 2.x) includes the HTTPMessage class which is used by httplib internally to parse HTTP responses. It's not meant to be used directly, but in this case I would probably still use it - it will handle some HTTP specific details for you.

以下是使用它来将正文与标题分离的方法:

Here's how you can use it to separate the body from the headers:

>>> from httplib import HTTPMessage
>>>
>>> f = open('gzipped_response.payload')
>>>
>>> # Or, if you already have the payload in memory as a string:
... # f = StringIO.StringIO(payload)
...
>>> status_line = f.readline()
>>> msg = HTTPMessage(f, 0)
>>> body = msg.fp.read()

HTTPMessage类的工作方式与 rfc822.Message 确实:

The HTTPMessage class works in a similar way the rfc822.Message does:

  • 首先,您需要读取(或丢弃)状态行(HTTP/1.1 200 OK),因为它不是RFC822消息的一部分,也不是标头.

  • First, you need to read (or discard) the status line (HTTP/1.1 200 OK), because that's not part of the RFC822 message, and is not a header.

然后使用打开文件的句柄实例化HTTPMessage,并将seekable参数设置为0.文件指针存储为msg.fp

Then you instantiate HTTPMessage with a handle to an open file and the seekable argument set to 0. The file pointer is stored as msg.fp

此后,用于解压缩压缩后的正文的代码便可以正常工作:

After that, your code for decompressing the gzipped body just works:

>>> body_stream = StringIO.StringIO(body)
>>> gzipper = gzip.GzipFile(fileobj=body_stream)
>>> data = gzipper.read()
>>>
>>> print data[:25]
<!DOCTYPE html>
<html>

这篇关于使用Python解压缩数据包的压缩后的有效载荷的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆