如何使用python捕获网络流量 [英] How to capture the network traffic using python
问题描述
我正在使用python并尝试抓取计算机与站点之间的HTTP通信,其中包括所有传入和传出的请求,响应(例如图像和外部呼叫等).
I am using python and attempting to scrape the HTTP(s) traffic between my computer and a site, which would include all incoming and outgoing requests,responses, such as images and external calls, etc.
我试图在我的hit_site
函数中查找网络流量,但是找不到该信息.
I have attempted to find the network traffic within my hit_site
function, but I'm not finding the information.
hit_site("http://www.google.com")
def hit_site(url):
print url
r = requests.get(url,stream = True)
print r.headers
print r.encoding
print r.status_code
print r.json()
print requests.get(url,stream=True)
print r.request.headers
print r.response.headers
for line in r.iter_lines():
print line
data = r.text
soup = BeautifulSoup(data)
return soup
以下是我要捕获的信息类型的示例(我使用fiddler2来获取此信息.所有这些以及更多的信息都来自于访问groupon.com):
An example of the type of information that I would like to capture is the following (I used fiddler2 to get this information. All of this and more came from visiting groupon.com):
# Result Protocol Host URL Body Caching Content-Type Process Comments Custom
6 200 HTTP www.groupon.com / 23,236 private, max-age=0, no-cache, no-store, must-revalidate text/html; charset=utf-8 chrome:6080
7 200 HTTP www.groupon.com /homepage-assets/styles-6fca4e9f48.css 6,766 public, max-age=31369910 text/css; charset=UTF-8 chrome:6080
8 200 HTTP Tunnel to img.grouponcdn.com:443 0 chrome:6080
9 200 HTTP img.grouponcdn.com /deal/gsPCLbbqioFVfvjT3qbBZo/The-Omni-Mount-Washington-Resort_01-960x582/v1/c550x332.jpg 94,555 public, max-age=315279127; Expires: Fri, 18 Oct 2024 22:20:20 GMT image/jpeg chrome:6080
10 200 HTTP img.grouponcdn.com /deal/d5YmjhxUBi2mgfCMoriV/pE-700x420/v1/c220x134.jpg 17,832 public, max-age=298601213; Expires: Mon, 08 Apr 2024 21:35:06 GMT image/jpeg chrome:6080
11 200 HTTP www.groupon.com /homepage-assets/main-fcfaf867e3.js 9,604 public, max-age=31369913 application/javascript chrome:6080
12 200 HTTP www.groupon.com /homepage-assets/locale.js?locale=en_US&country=US 1,507 public, max-age=994 application/javascript chrome:6080
13 200 HTTP www.groupon.com /tracky 3 application/octet-stream chrome:6080
14 200 HTTP www.groupon.com /cart/widget?consumerId=b577c9c2-4f07-11e4-8305-0025906127fe 17 private, max-age=0, no-cache, no-store, must-revalidate application/json; charset=utf-8 chrome:6080
15 200 HTTP www.googletagmanager.com /gtm.js?id=GTM-B76Z 39,061 private, max-age=911; Expires: Wed, 22 Oct 2014 20:48:14 GMT text/javascript; charset=UTF-8 chrome:6080
我非常感谢关于如何使用python捕获网络流量的任何想法.
推荐答案
dpkt 是一个广泛的工具(用Python编写),用于解析TCP流量,该工具 pypcapfile .
dpkt is an extensive tool (written in Python) for parsing TCP traffic, which includes support for decoding packets involved in the SSL handshake. Another tool for running and decoding captures from Python is pypcapfile.
请注意,要解码SSL流量包括数据,需要知道私钥.对于您无法控制的第三方服务器(例如Google)而言,这有些问题,并且需要付出很大的努力才能解决该问题.一种这样的方法是设置一个具有已知私钥的代理来播放中间人(并将自签名的CA安装到本地商店中以强制浏览器接受它).
Note that for decoding SSL traffic including data, private keys need to be known. This is somewhat problematic for a third-party server you don't control such as Google, and significant effort is required to work around it. One such approach is to set up a proxy with a known private key to play man-in-the-middle (and install a self-signed CA into your local store to force the browser to accept it).
这篇关于如何使用python捕获网络流量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!