一些线程创建Webdriver时Python Selenium失败 [英] Python Selenium Failing When Some Threads Create Webdriver

查看：95 发布时间：2020/5/13 23:27:00 python multithreading selenium

本文介绍了一些线程创建Webdriver时Python Selenium失败的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个线程，它接受一个URL，用硒请求它并解析数据.

I have a thread which takes a URL, requests it in selenium and parses up the data.

大多数情况下，此线程工作正常.但是有时似乎无法继续使用webdriver，而我似乎无法例外地对其进行处理.

Most of the time this thread works fine. But sometimes it seems to hang on creating the webdriver and I can't seem to exception handle it.

这是线程的开始:

def GetLink(eachlink):

    trry = 0 #10 Attempts at getting the data

    while trry < 10:

        print "Scraping:  ", eachlink
        try:

            Numbergrab = []
            Namegrab = []
            Positiongrab = []

            nextproxy = (random.choice(ProxyList))
            nextuseragent = (random.choice(UseragentsList))
            proxywrite = '--proxy=',nextproxy
            service_args = [
            proxywrite,
            '--proxy-type=http',
            '--ignore-ssl-errors=true',
            ]

            dcap = dict(DesiredCapabilities.PHANTOMJS)
            dcap["phantomjs.page.settings.userAgent"] = (nextuseragent)
            pDriver = webdriver.PhantomJS('C:\phantomjs.exe',desired_capabilities=dcap, service_args=service_args)
            pDriver.set_window_size(1024, 768) # optional
            pDriver.set_page_load_timeout(20)

            print "Requesting link: ", eachlink
            pDriver.get(eachlink)
            try:
                WebDriverWait(pDriver, 10).until(EC.presence_of_element_located((By.XPATH, "//div[@class='seat-setting']")))
            except:
                time.sleep(10)

那是一个片段，但这是重要的部分，因为当它工作时，它将继续正常运行.

That's a snippet but that's the important part because when it's working it'll continue fine.

但是当某些东西停顿时，其中一个线程会向控制台发送一个抓取:链接"，但不会向控制台发送一个请求链接:链接".

But when something stalls one of the threads will send a "scraping: link" to the console but not a "Requesting link: link" to the console.

这意味着在实际设置webdriver时线程正在停止.据我所知，这是线程安全的，我尝试使用lock.aquire并从20个批处理中随机分配一个.exe来获得相同的结果.

Which means the thread is stalling when actually setting up the webdriver. As far as I've ever seen this is thread safe and I've tried using lock.aquire and giving it a random .exe out of a batch of 20 with the same results.

有时线程会完美运行，然后无处停止而无法发出请求.

Sometimes the threads will work perfectly then out of nowhere one stops without being able to make the request.

更新:

有时，当我关闭控制台时，它告诉我有一个socket.error.您可以在该代码段中看到尝试的开始，而我在结尾处有这个代码:

Sometimes when I close the console it tells me there was a socket.error. You can see the start of the try in that snippet there I have this at the end:

except:
                trry +=1
                e = sys.exc_info()[0]
                print "Problem scraping link: ", e

但是它会愉快地坐在那里几个小时，直到我完全关闭控制台为止.然后，它弹出并显示socket.error以及死掉的线程的打印"scraping:link"消息.

But it'll happily sit there for hours saying nothing until I physically close the console. Then it pops up with socket.error and the print "scraping: link" message for the thread which died.

这实际上表明它甚至在启动while之前就失败了，但是在该线程的开始处将trry设置为0，而在其他任何地方都没有引用.另外，如果没有selenium网络驱动程序，就不会出现socket.error，因此它也必须阻止早期的消息.

Which actually suggests it's failing before even starting the while but that trry is set to 0 at the start of that thread and isn't referenced anywhere else. Plus there'd be no socket.error to be had if it didn't have a selenium webdriver so it must be blocking the earlier message as well.

更新#2:

当运行完全相同的代码的单个线程时，看起来很高兴可以运行几个小时.

It looks like it's happy to run for hours when running a single thread of the exact same code.

但是线程锁没有任何作用.

But a thread lock didn't make a difference.

有点难过.将尝试使用子进程而不是线程来查看其作用.

Little stumped. Going to try a subprocess instead of a thread to see what that does.

更新#3:

线程不稳定的时间长了，但是子处理却稳定了. OK Python.

Threading isn't stable long but subprocessing is. OK Python.

推荐答案

在多线程和多处理以及使用Firefox，Chrome或PhantomJS时，我都遇到了这一问题.无论出于何种原因，实例化浏览器的调用(例如driver = webdriver.Chrome())都不会返回.

I've encountered this with both multithreading and multiprocessing, and when using Firefox, Chrome, or PhantomJS. For whatever reason, the call to instantiate the browser (e.q. driver = webdriver.Chrome()), never returns.

我的大多数脚本的生命周期都相对较短，几乎没有线程/进程，因此这个问题并不常见.但是，我有一些脚本可以运行几个小时，并且可以创建和销毁数百个浏览器对象，而且我保证每次运行都会遇到几次挂起的情况.

Most of my scripts are relatively short lived with few threads/processes, so the problem isn't often seen. I have a few scripts, however, that will run for several hours and create and destroy several hundred browser objects, and I'm guaranteed to experience the hang a few times a run.

我的解决方案是将浏览器实例化到其自己的函数/方法中，然后使用PyPI提供的许多超时和重试装饰器之一来装饰函数/方法:

My solution is to put the browser instantiation into its own function/method, and then decorate the function/method with one of the many timeout and retry decorators available from PyPI:

(未经测试)

from retrying import retry
from selenium import webdriver
from timeoutcontext import timeout, TimeoutException


def retry_if_timeoutexception(exception):
    return isinstance(exception, TimeoutException)


@retry(retry_on_exception=retry_if_timeoutexception, stop_max_attempt_number=3)
@timeout(30)  # Allow the function 30 seconds to create and return the object
def get_browser():
    return webdriver.Chrome()

https://pypi.python.org/pypi/retrying

https://pypi.python.org/pypi/timeoutcontext

这篇关于一些线程创建Webdriver时Python Selenium失败的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

一些线程创建Webdriver时Python Selenium失败 [英] Python Selenium Failing When Some Threads Create Webdriver

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

一些线程创建Webdriver时Python Selenium失败 [英] Python Selenium Failing When Some Threads Create Webdriver

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭