使用Selenium/PhantomJS进行网络捕获 [英] Network capturing with Selenium/PhantomJS

查看:105
本文介绍了使用Selenium/PhantomJS进行网络捕获的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想捕获到我正在使用python和Selenium浏览的网站的访问量,并且由于使用代理服务器的访问量将是https,所以不会让我迷路.

I want to capture the traffic to sites I'm browsing to using Selenium with python and since the traffic will be https using a proxy won't get me far.

我的想法是使用硒来运行phantomJS并使用phantomJS执行脚本(不是使用webdriver.execute_script()在页面上,而是在phantomJS本身上).我在想netlog.js脚本(从这里 https://github .com/ariya/phantomjs/blob/master/examples/netlog.js ).

My idea was to run phantomJS with selenium to and use phantomJS to execute a script (not on the page using webdriver.execute_script(), but on phantomJS itself). I was thinking of the netlog.js script (from here https://github.com/ariya/phantomjs/blob/master/examples/netlog.js).

因为它在命令行中是这样的

Since it works like this in the command line

phantomjs --cookies-file=/tmp/foo netlog.js https://google.com

使用硒必须有类似的方法吗?

there must be a similar way to do this with selenium?

预先感谢

更新:

使用browsermob-proxy解决了它.

Solved it with browsermob-proxy.

pip3 install browsermob-proxy

Python3代码

from selenium import webdriver
from browsermobproxy import Server

server = Server(<path to browsermob-proxy>)
server.start()
proxy = server.create_proxy({'captureHeaders': True, 'captureContent': True, 'captureBinaryContent': True})

service_args = ["--proxy=%s" % proxy.proxy, '--ignore-ssl-errors=yes']
driver = webdriver.PhantomJS(service_args=service_args)

proxy.new_har()
driver.get('https://google.com')
print(proxy.har)  # this is the archive
# for example:
all_requests = [entry['request']['url'] for entry in proxy.har['log']['entries']]

推荐答案

我正在为此使用代理

from selenium import webdriver
from browsermobproxy import Server

server = Server(environment.b_mob_proxy_path)
server.start()
proxy = server.create_proxy()
service_args = ["--proxy-server=%s" % proxy.proxy]
driver = webdriver.PhantomJS(service_args=service_args)

proxy.new_har()
driver.get('url_to_open')
print proxy.har  # this is the archive
# for example:
all_requests = [entry['request']['url'] for entry in proxy.har['log']['entries']]

"har"(http存档格式)还有许多有关请求和响应的其他信息,对我来说非常有用

the 'har' (http archive format) has a lot of other information about the requests and responses, it's very useful to me

在Linux上安装:

pip install browsermob-proxy

这篇关于使用Selenium/PhantomJS进行网络捕获的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆