如何使用python捕获网络流量 [英] How to capture the network traffic using python

查看:624
本文介绍了如何使用python捕获网络流量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用python并尝试抓取计算机与站点之间的HTTP通信,其中包括所有传入和传出的请求,响应(例如图像和外部呼叫等).

I am using python and attempting to scrape the HTTP(s) traffic between my computer and a site, which would include all incoming and outgoing requests,responses, such as images and external calls, etc.

我试图在我的hit_site函数中查找网络流量,但是找不到该信息.

I have attempted to find the network traffic within my hit_site function, but I'm not finding the information.

hit_site("http://www.google.com")

def hit_site(url):
    print url
    r = requests.get(url,stream = True)
    print r.headers
    print r.encoding
    print r.status_code
    print r.json()
    print requests.get(url,stream=True)
    print r.request.headers
    print r.response.headers
    for line in r.iter_lines():
        print line
    data = r.text
    soup = BeautifulSoup(data)
    return soup

以下是我要捕获的信息类型的示例(我使用fiddler2来获取此信息.所有这些以及更多的信息都来自于访问groupon.com):

An example of the type of information that I would like to capture is the following (I used fiddler2 to get this information. All of this and more came from visiting groupon.com):

#   Result  Protocol    Host    URL Body    Caching Content-Type    Process Comments    Custom  
6   200 HTTP    www.groupon.com /   23,236  private, max-age=0, no-cache, no-store, must-revalidate text/html; charset=utf-8    chrome:6080         
7   200 HTTP    www.groupon.com /homepage-assets/styles-6fca4e9f48.css  6,766   public, max-age=31369910    text/css; charset=UTF-8 chrome:6080         
8   200 HTTP    Tunnel to   img.grouponcdn.com:443  0           chrome:6080         
9   200 HTTP    img.grouponcdn.com  /deal/gsPCLbbqioFVfvjT3qbBZo/The-Omni-Mount-Washington-Resort_01-960x582/v1/c550x332.jpg    94,555  public, max-age=315279127; Expires: Fri, 18 Oct 2024 22:20:20 GMT   image/jpeg  chrome:6080         
10  200 HTTP    img.grouponcdn.com  /deal/d5YmjhxUBi2mgfCMoriV/pE-700x420/v1/c220x134.jpg   17,832  public, max-age=298601213; Expires: Mon, 08 Apr 2024 21:35:06 GMT   image/jpeg  chrome:6080         
11  200 HTTP    www.groupon.com /homepage-assets/main-fcfaf867e3.js 9,604   public, max-age=31369913    application/javascript  chrome:6080         
12  200 HTTP    www.groupon.com /homepage-assets/locale.js?locale=en_US&country=US  1,507   public, max-age=994 application/javascript  chrome:6080         
13  200 HTTP    www.groupon.com /tracky 3       application/octet-stream    chrome:6080         
14  200 HTTP    www.groupon.com /cart/widget?consumerId=b577c9c2-4f07-11e4-8305-0025906127fe    17  private, max-age=0, no-cache, no-store, must-revalidate application/json; charset=utf-8 chrome:6080         
15  200 HTTP    www.googletagmanager.com    /gtm.js?id=GTM-B76Z 39,061  private, max-age=911; Expires: Wed, 22 Oct 2014 20:48:14 GMT    text/javascript; charset=UTF-8  chrome:6080         

我非常感谢关于如何使用python捕获网络流量的任何想法.

推荐答案

dpkt 是一个广泛的工具(用Python编写),用于解析TCP流量,该工具 pypcapfile .

dpkt is an extensive tool (written in Python) for parsing TCP traffic, which includes support for decoding packets involved in the SSL handshake. Another tool for running and decoding captures from Python is pypcapfile.

请注意,要解码SSL流量包括数据,需要知道私钥.对于您无法控制的第三方服务器(例如Google)而言,这有些问题,并且需要付出很大的努力才能解决该问题.一种这样的方法是设置一个具有已知私钥的代理来播放中间人(并将自签名的CA安装到本地商店中以强制浏览器接受它).

Note that for decoding SSL traffic including data, private keys need to be known. This is somewhat problematic for a third-party server you don't control such as Google, and significant effort is required to work around it. One such approach is to set up a proxy with a known private key to play man-in-the-middle (and install a self-signed CA into your local store to force the browser to accept it).

这篇关于如何使用python捕获网络流量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆