一些线程创建Webdriver时Python Selenium失败 [英] Python Selenium Failing When Some Threads Create Webdriver
问题描述
我有一个线程,它接受一个URL,用硒请求它并解析数据.
I have a thread which takes a URL, requests it in selenium and parses up the data.
大多数情况下,此线程工作正常.但是有时似乎无法继续使用webdriver,而我似乎无法例外地对其进行处理.
Most of the time this thread works fine. But sometimes it seems to hang on creating the webdriver and I can't seem to exception handle it.
这是线程的开始:
def GetLink(eachlink):
trry = 0 #10 Attempts at getting the data
while trry < 10:
print "Scraping: ", eachlink
try:
Numbergrab = []
Namegrab = []
Positiongrab = []
nextproxy = (random.choice(ProxyList))
nextuseragent = (random.choice(UseragentsList))
proxywrite = '--proxy=',nextproxy
service_args = [
proxywrite,
'--proxy-type=http',
'--ignore-ssl-errors=true',
]
dcap = dict(DesiredCapabilities.PHANTOMJS)
dcap["phantomjs.page.settings.userAgent"] = (nextuseragent)
pDriver = webdriver.PhantomJS('C:\phantomjs.exe',desired_capabilities=dcap, service_args=service_args)
pDriver.set_window_size(1024, 768) # optional
pDriver.set_page_load_timeout(20)
print "Requesting link: ", eachlink
pDriver.get(eachlink)
try:
WebDriverWait(pDriver, 10).until(EC.presence_of_element_located((By.XPATH, "//div[@class='seat-setting']")))
except:
time.sleep(10)
那是一个片段,但这是重要的部分,因为当它工作时,它将继续正常运行.
That's a snippet but that's the important part because when it's working it'll continue fine.
但是当某些东西停顿时,其中一个线程会向控制台发送一个抓取:链接",但不会向控制台发送一个请求链接:链接".
But when something stalls one of the threads will send a "scraping: link" to the console but not a "Requesting link: link" to the console.
这意味着在实际设置webdriver时线程正在停止.据我所知,这是线程安全的,我尝试使用lock.aquire并从20个批处理中随机分配一个.exe来获得相同的结果.
Which means the thread is stalling when actually setting up the webdriver. As far as I've ever seen this is thread safe and I've tried using lock.aquire and giving it a random .exe out of a batch of 20 with the same results.
有时线程会完美运行,然后无处停止而无法发出请求.
Sometimes the threads will work perfectly then out of nowhere one stops without being able to make the request.
更新:
有时,当我关闭控制台时,它告诉我有一个socket.error.您可以在该代码段中看到尝试的开始,而我在结尾处有这个代码:
Sometimes when I close the console it tells me there was a socket.error. You can see the start of the try in that snippet there I have this at the end:
except:
trry +=1
e = sys.exc_info()[0]
print "Problem scraping link: ", e
但是它会愉快地坐在那里几个小时,直到我完全关闭控制台为止.然后,它弹出并显示socket.error以及死掉的线程的打印"scraping:link"消息.
But it'll happily sit there for hours saying nothing until I physically close the console. Then it pops up with socket.error and the print "scraping: link" message for the thread which died.
这实际上表明它甚至在启动while之前就失败了,但是在该线程的开始处将trry设置为0,而在其他任何地方都没有引用.另外,如果没有selenium网络驱动程序,就不会出现socket.error,因此它也必须阻止早期的消息.
Which actually suggests it's failing before even starting the while but that trry is set to 0 at the start of that thread and isn't referenced anywhere else. Plus there'd be no socket.error to be had if it didn't have a selenium webdriver so it must be blocking the earlier message as well.
更新#2:
当运行完全相同的代码的单个线程时,看起来很高兴可以运行几个小时.
It looks like it's happy to run for hours when running a single thread of the exact same code.
但是线程锁没有任何作用.
But a thread lock didn't make a difference.
有点难过.将尝试使用子进程而不是线程来查看其作用.
Little stumped. Going to try a subprocess instead of a thread to see what that does.
更新#3:
线程不稳定的时间长了,但是子处理却稳定了. OK Python.
Threading isn't stable long but subprocessing is. OK Python.
推荐答案
在多线程和多处理以及使用Firefox,Chrome或PhantomJS时,我都遇到了这一问题.无论出于何种原因,实例化浏览器的调用(例如driver = webdriver.Chrome()
)都不会返回.
I've encountered this with both multithreading and multiprocessing, and when using Firefox, Chrome, or PhantomJS. For whatever reason, the call to instantiate the browser (e.q. driver = webdriver.Chrome()
), never returns.
我的大多数脚本的生命周期都相对较短,几乎没有线程/进程,因此这个问题并不常见.但是,我有一些脚本可以运行几个小时,并且可以创建和销毁数百个浏览器对象,而且我保证每次运行都会遇到几次挂起的情况.
Most of my scripts are relatively short lived with few threads/processes, so the problem isn't often seen. I have a few scripts, however, that will run for several hours and create and destroy several hundred browser objects, and I'm guaranteed to experience the hang a few times a run.
我的解决方案是将浏览器实例化到其自己的函数/方法中,然后使用PyPI提供的许多超时和重试装饰器之一来装饰函数/方法:
My solution is to put the browser instantiation into its own function/method, and then decorate the function/method with one of the many timeout and retry decorators available from PyPI:
(未经测试)
from retrying import retry
from selenium import webdriver
from timeoutcontext import timeout, TimeoutException
def retry_if_timeoutexception(exception):
return isinstance(exception, TimeoutException)
@retry(retry_on_exception=retry_if_timeoutexception, stop_max_attempt_number=3)
@timeout(30) # Allow the function 30 seconds to create and return the object
def get_browser():
return webdriver.Chrome()
https://pypi.python.org/pypi/retrying
https://pypi.python.org/pypi/timeoutcontext
这篇关于一些线程创建Webdriver时Python Selenium失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!