酸洗 Selenium Webdriver 对象 [英] Pickling Selenium Webdriver Objects

查看:53
本文介绍了酸洗 Selenium Webdriver 对象的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想序列化并存储一个 selenium webdriver 对象,以便稍后我可以在代码的其他地方使用它.我正在尝试使用泡菜来做到这一点.如果有另一种方法可以保存 webdriver 对象的状态,那么我可以稍后再次调用它,那就太好了(我不能只是重新加载 url,因为我正在查看的网站是 javascript-heavy当前页面取决于我目前点击的内容).

I want to serialize and store a selenium webdriver object so then I could use it later elsewhere in my code. I'm trying to use pickle to do this. If there is another way to save the state of a webdriver object, so I can bring it up again later, that'd be great (I can't just reload the url, since the websites I am looking at are javascript-heavy and the current page depends on what I've clicked on so far).

目前,我有这样的代码.

Currently, I have code like this.

import pickle
from selenium import webdriver

d = webdriver.PhantomJS()
d.get(url)
d.find_element_by_xpath(xpath).click()
p = pickle.dumps(d, pickle.HIGHEST_PROTOCOL)
# Stuff happens here.
new_driver = pickle.loads(p)
print new_driver.page_source.encode('utf-8', 'ignore')

当我运行这个时,我收到以下错误(错误发生在我打印时,而不是之前):

When I run this, I get the following error (the error occurs when I print, not before):

    return self.driver.page_source.encode('utf-8', 'ignore')
File "/home/eric/dev/crawler-env/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 436, in page_source
    return self.execute(Command.GET_PAGE_SOURCE)['value']
File "/home/eric/dev/crawler-env/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 163, in execute
    response = self.command_executor.execute(driver_command, params)
File "/home/eric/dev/crawler-env/local/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 349, in execute
    return self._request(url, method=command_info[0], data=data)
File "/home/eric/dev/crawler-env/local/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 396, in _request
    response = opener.open(request)
File "/usr/lib/python2.7/urllib2.py", line 404, in open
    response = self._open(req, data)
File "/usr/lib/python2.7/urllib2.py", line 422, in _open
    '_open', req)
File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
    result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 1214, in http_open
    return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib/python2.7/urllib2.py", line 1184, in do_open
    raise URLError(err)
urllib2.URLError: <urlopen error [Errno 111] Connection refused>

是否可以序列化我的 webdriver 对象?如果没有,我的替代方案是什么?

Is it possible to serialize my webdriver objects? If not, what are my alternatives?

更新:

经过进一步检查,即使我再次执行 d.get(url) 之类的操作而不是打印页面源代码,它也会给我同样的错误.webdriver 对象在被pickle/unpickled 时会发生什么事情吗?

Upon further inspection, even if I do something like d.get(url) again instead of printing the page source, it gives me the same error. Does something happen to the webdriver object when it is pickled/unpickled?

推荐答案

我能够处理 selenium.webdriver.Remote 对象.dill 或 pickle 都不适合我序列化 selenium.webdriver.Chrome 对象,python 在其中创建并运行浏览器进程.但是,如果我 (1) 运行独立的 java selenium2 webserver,(2) 在一个进程中,创建一个 selenium.webdriver.Remote 连接到该服务器并将其 pickle/dill 到一个文件,(3) 在另一个进程中,它们都可以工作, 反序列化 Remote 实例并使用它.

I was able to pickle a selenium.webdriver.Remote object. Neither dill or pickle worked for me to serialize a selenium.webdriver.Chrome object, in which python creates and runs the browser process. However, they both worked if I (1) ran the standalone java selenium2 webserver, (2) in one process, create a selenium.webdriver.Remote connection to that server and pickle/dill that to a file, (3) In another process, unserialize the Remote instance and use it.

这导致能够关闭 python 进程,然后重新连接到现有的 webdriver 浏览器并发出新命令(可能来自不同的 python 脚本).如果我关闭 selenium Web 浏览器,则需要从头开始创建一个新实例.

This led to being able to close the python process and then re-connect to the existing webdriver browser and issue new commands (could be from a different python script). If I close the selenium web browser then a new instance needs to be created from scratch.

server.py:

import pickle
import selenium.webdriver

EXECUTOR = 'http://127.0.0.1:4444/wd/hub'
FILENAME = '/tmp/pickle'

opt = selenium.webdriver.chrome.options.Options()
capabilities = opt.to_capabilities()
driver = selenium.webdriver.Remote(command_executor=EXECUTOR, desired_capabilities=capabilities)
fp = open(FILENAME, 'wb')
pickle.dump(driver, fp)

client.py:

import pickle

FILENAME = '/tmp/pickle'

driver = pickle.load(open(FILENAME, 'rb')
driver.get('http://www.google.com')
el = driver.find_element_by_id('lst-ib')
print(el)

注意 (2020-08-08):以这种方式酸洗硒在最新的硒(4.x)中停止工作.Pickle 无法pickle 内部套接字对象.一种选择是在 setup.py 中的 install_requires 组件中添加一个selenium=3.141.0"项,这对我仍然有效.

Note (2020-08-08): Pickling selenium in this way stopped working in the latest selenium (4.x). Pickle fails to pickle an internal socket object. One option is to add a 'selenium=3.141.0' item to the install_requires component in setup.py which still works for me.

这篇关于酸洗 Selenium Webdriver 对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆