一些线程创建Webdriver时Python Selenium失败 [英] Python Selenium Failing When Some Threads Create Webdriver

查看:95
本文介绍了一些线程创建Webdriver时Python Selenium失败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个线程,它接受一个URL,用硒请求它并解析数据.

I have a thread which takes a URL, requests it in selenium and parses up the data.

大多数情况下,此线程工作正常.但是有时似乎无法继续使用webdriver,而我似乎无法例外地对其进行处理.

Most of the time this thread works fine. But sometimes it seems to hang on creating the webdriver and I can't seem to exception handle it.

这是线程的开始:

def GetLink(eachlink):

    trry = 0 #10 Attempts at getting the data

    while trry < 10:

        print "Scraping:  ", eachlink
        try:

            Numbergrab = []
            Namegrab = []
            Positiongrab = []

            nextproxy = (random.choice(ProxyList))
            nextuseragent = (random.choice(UseragentsList))
            proxywrite = '--proxy=',nextproxy
            service_args = [
            proxywrite,
            '--proxy-type=http',
            '--ignore-ssl-errors=true',
            ]

            dcap = dict(DesiredCapabilities.PHANTOMJS)
            dcap["phantomjs.page.settings.userAgent"] = (nextuseragent)
            pDriver = webdriver.PhantomJS('C:\phantomjs.exe',desired_capabilities=dcap, service_args=service_args)
            pDriver.set_window_size(1024, 768) # optional
            pDriver.set_page_load_timeout(20)

            print "Requesting link: ", eachlink
            pDriver.get(eachlink)
            try:
                WebDriverWait(pDriver, 10).until(EC.presence_of_element_located((By.XPATH, "//div[@class='seat-setting']")))
            except:
                time.sleep(10)

那是一个片段,但这是重要的部分,因为当它工作时,它将继续正常运行.

That's a snippet but that's the important part because when it's working it'll continue fine.

但是当某些东西停顿时,其中一个线程会向控制台发送一个抓取:链接",但不会向控制台发送一个请求链接:链接".

But when something stalls one of the threads will send a "scraping: link" to the console but not a "Requesting link: link" to the console.

这意味着在实际设置webdriver时线程正在停止.据我所知,这是线程安全的,我尝试使用lock.aquire并从20个批处理中随机分配一个.exe来获得相同的结果.

Which means the thread is stalling when actually setting up the webdriver. As far as I've ever seen this is thread safe and I've tried using lock.aquire and giving it a random .exe out of a batch of 20 with the same results.

有时线程会完美运行,然后无处停止而无法发出请求.

Sometimes the threads will work perfectly then out of nowhere one stops without being able to make the request.

更新:

有时,当我关闭控制台时,它告诉我有一个socket.error.您可以在该代码段中看到尝试的开始,而我在结尾处有这个代码:

Sometimes when I close the console it tells me there was a socket.error. You can see the start of the try in that snippet there I have this at the end:

except:
                trry +=1
                e = sys.exc_info()[0]
                print "Problem scraping link: ", e

但是它会愉快地坐在那里几个小时,直到我完全关闭控制台为止.然后,它弹出并显示socket.error以及死掉的线程的打印"scraping:link"消息.

But it'll happily sit there for hours saying nothing until I physically close the console. Then it pops up with socket.error and the print "scraping: link" message for the thread which died.

这实际上表明它甚至在启动while之前就失败了,但是在该线程的开始处将trry设置为0,而在其他任何地方都没有引用.另外,如果没有selenium网络驱动程序,就不会出现socket.error,因此它也必须阻止早期的消息.

Which actually suggests it's failing before even starting the while but that trry is set to 0 at the start of that thread and isn't referenced anywhere else. Plus there'd be no socket.error to be had if it didn't have a selenium webdriver so it must be blocking the earlier message as well.

更新#2:

当运行完全相同的代码的单个线程时,看起来很高兴可以运行几个小时.

It looks like it's happy to run for hours when running a single thread of the exact same code.

但是线程锁没有任何作用.

But a thread lock didn't make a difference.

有点难过.将尝试使用子进程而不是线程来查看其作用.

Little stumped. Going to try a subprocess instead of a thread to see what that does.

更新#3:

线程不稳定的时间长了,但是子处理却稳定了. OK Python.

Threading isn't stable long but subprocessing is. OK Python.

推荐答案

在多线程和多处理以及使用Firefox,Chrome或PhantomJS时,我都遇到了这一问题.无论出于何种原因,实例化浏览器的调用(例如driver = webdriver.Chrome())都不会返回.

I've encountered this with both multithreading and multiprocessing, and when using Firefox, Chrome, or PhantomJS. For whatever reason, the call to instantiate the browser (e.q. driver = webdriver.Chrome()), never returns.

我的大多数脚本的生命周期都相对较短,几乎没有线程/进程,因此这个问题并不常见.但是,我有一些脚本可以运行几个小时,并且可以创建和销毁数百个浏览器对象,而且我保证每次运行都会遇到几次挂起的情况.

Most of my scripts are relatively short lived with few threads/processes, so the problem isn't often seen. I have a few scripts, however, that will run for several hours and create and destroy several hundred browser objects, and I'm guaranteed to experience the hang a few times a run.

我的解决方案是将浏览器实例化到其自己的函数/方法中,然后使用PyPI提供的许多超时和重试装饰器之一来装饰函数/方法:

My solution is to put the browser instantiation into its own function/method, and then decorate the function/method with one of the many timeout and retry decorators available from PyPI:

(未经测试)

from retrying import retry
from selenium import webdriver
from timeoutcontext import timeout, TimeoutException


def retry_if_timeoutexception(exception):
    return isinstance(exception, TimeoutException)


@retry(retry_on_exception=retry_if_timeoutexception, stop_max_attempt_number=3)
@timeout(30)  # Allow the function 30 seconds to create and return the object
def get_browser():
    return webdriver.Chrome()

https://pypi.python.org/pypi/retrying

https://pypi.python.org/pypi/timeoutcontext

这篇关于一些线程创建Webdriver时Python Selenium失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆