WebDriverException:消息:异常..."Failure" nsresult:"0x80004005(NS_ERROR_FAILURE)"同时使用Selenium Python保存大型html文件 [英] WebDriverException: Message: Exception... "Failure" nsresult: "0x80004005 (NS_ERROR_FAILURE)" while saving a large html file using Selenium Python

查看:252
本文介绍了WebDriverException:消息:异常..."Failure" nsresult:"0x80004005(NS_ERROR_FAILURE)"同时使用Selenium Python保存大型html文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在滚动浏览Google Play商店和应用的评论,这些评论由应用页面的URL指定.然后,Selenium找到评论并向下滚动以加载所有评论.滚动部分有效,没有无头选项,我可以看到Selenium到达站点的末端.无法正常工作的是保存html内容以进行进一步分析.

I'm scrolling through the Google Play Store and the reviews for an app, specified by the URL to the app page. Selenium then finds the reviews and scrolls down to load all reviews. The scrolling part works, without the headless option I can watch Selenium reaching the end of the site. What's not working is saving the html content for further analysis.

基于其他答案,我尝试了其他方法来保存源代码.

Based on other answers I tried different methods for saving the source code.

innerHTML = DRIVER.execute_script("return document.body.innerHTML")

innerHTML = DRIVER.page_source

两者都会导致相同的错误消息和异常.

Both result in the same error message and exception.

我的用于滚动浏览页面并加载所有评论的代码

My code for scrolling through the page and loading all reviews

SCROLL_PAUSE_TIME = 5
options = Options()
options.headless = True
FP = webdriver.FirefoxProfile()
FP.set_preference("intl.accept_languages", "de")

for url in START_URLS:

    try:
        DRIVER = webdriver.Firefox(options=options, firefox_profile=FP)
        DRIVER.get(url)
        time.sleep(SCROLL_PAUSE_TIME)
        app_name = DRIVER.find_element_by_xpath('//h1[@itemprop="name"]').get_attribute('innerText')
        all_reviews_button = DRIVER.find_element_by_xpath('//span[text()="Alle Bewertungen lesen"]')
        all_reviews_button.click()
        time.sleep(SCROLL_PAUSE_TIME)
        last_height = DRIVER.execute_script("return document.body.scrollHeight")
        while True:
            DRIVER.execute_script("window.scrollTo(0, document.body.scrollHeight);")
            try:
                DRIVER.find_element_by_xpath('//span[text()="Mehr anzeigen"]').click()
            except:
                pass
            time.sleep(SCROLL_PAUSE_TIME)
            new_height = DRIVER.execute_script("return document.body.scrollHeight")
            if new_height == last_height:
                logger.info('Durchlauf erfolgreich')
                innerHTML = DRIVER.execute_script("return document.body.innerHTML")
                with open(app_name +'.html','w', encoding='utf-8') as out:
                   out.write(html)
                break
            last_height = new_height

    except Exception as e:
        logger.error('Exception occurred', exc_info=True)
    finally:
        DRIVER.quit()

日志文件,显示无穷大滚动到达页面末尾,但无法保存文件

the log file, showing that the infinity scroll reached the end of the page but couldn't save the file

10.09.19 16:12:00 - INFO - Durchlauf erfolgreich
10.09.19 16:12:13 - ERROR - Exception occurred
Traceback (most recent call last):
  File "scraper.py", line 57, in <module>
    innerHTML = DRIVER.execute_script("return document.body.innerHTML")
  File "C:\Users\tenscher\AppData\Local\Programs\Python\Python36\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 636, in execute_script
    'args': converted_args})['value']
  File "C:\Users\tenscher\AppData\Local\Programs\Python\Python36\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "C:\Users\tenscher\AppData\Local\Programs\Python\Python36\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: [Exception... "Failure"  nsresult: "0x80004005 (NS_ERROR_FAILURE)"  location: "JS frame :: chrome://marionette/content/proxy.js :: sendReply_ :: line 275"  data: no]

geckodriver.log的最后一部分

last part of the geckodriver.log

...
1568124670155   Marionette  WARN    TimedPromise timed out after 500 ms: stacktrace:
bail@chrome://marionette/content/sync.js:223:64
1568124693017   Marionette  WARN    TimedPromise timed out after 500 ms: stacktrace:
bail@chrome://marionette/content/sync.js:223:64
1568124734637   Marionette  INFO    Stopped listening on port 57015
[Parent 14684, Gecko_IOThread] WARNING: pipe error: 109: file z:/task_1560820494/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 341
[Child 10464, Chrome_ChildThread] WARNING: pipe error: 109: file z:/task_1560820494/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 341
[Parent 14684, Gecko_IOThread] WARNING: pipe error: 109: file z:/task_1560820494/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 341
JavaScript error: resource:///modules/sessionstore/SessionStore.jsm, line 1639: TypeError: subject.QueryInterface is not a function
A content process crashed and MOZ_CRASHREPORTER_SHUTDOWN is set, shutting down
[Child 2508, Chrome_ChildThread] WARNING: pipe error: 109: file z:/task_1560820494/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 341
[Child]

我想将页面另存为文件,并在下一步中解析html以提取评论.但是,保存部分无法处理较大的页面.如果我说了100步后退出了while循环并保存了页面,那么它将正常工作.

I'd like to save the page as a file and in the next step parse the html to extract the reviews. However the saving part is not working with a large page. If I exit the while loop after say 100 steps and save the page it works fine.

推荐答案

NS_ERROR_FAILURE(0x80004005)

这是所有错误的一般性错误,对于所有不适用更特定错误代码的错误都会发生.

NS_ERROR_FAILURE (0x80004005)

This is the generic error of all the errors and occurs for all errors for which a more specific error code does not apply.

但是此错误消息...

However this error message...

selenium.common.exceptions.WebDriverException: Message: [Exception... "Failure"  nsresult: "0x80004005 (NS_ERROR_FAILURE)"  location: "JS frame :: chrome://marionette/content/proxy.js :: sendReply_ :: line 275"  data: no]

...表示牵线木偶在尝试读取/存储/复制page_source()时抛出了错误.

...implies that the Marionette threw an error while attempting to read/store/copy the page_source().

相关的 HTML DOM 可以帮助我们更好地调试问题道路.但是,似乎问题在于page_source()确实很大/很大,超过了牵线木偶可以处理的最大值.可能是您要处理的string更大.

The relevant HTML DOM would have helped us to debug the issue in a better way. However it seems the issue is with the fact that the page_source() is emencely huge/large which exceeds the max value of the max value Marionette can handle. Possibly it's a much bigger string you're dealing with.

一种快速的解决方案是避免将page_source()传递给变量并将其打印出来以找出实际问题所在.

A quick solution will be to avoid passing the page_source() to the variable and print it to find out where the actual issue lies.

print(DRIVER.execute_script("return document.body.innerHTML"))

print(DRIVER.page_source)


参考

您可以在以下位置找到一些相关的讨论


Reference

You can find a couple of relevant discussion in:

文档链接:

  • WebDriver:TakeScreenshot generates error when web page has a big height
  • WebDriver:TakeScreenshot fails in canvas "scale()" for huge web pages
  • Exception NS_ERROR_FAILURE in ctx.scale() if width or height is greater than 32767
  • event.synthesizeMouseAtPoint() should only call nsIDOMWindowUtils.sendMouseEvent() if there is a valid window handle

这篇关于WebDriverException:消息:异常..."Failure" nsresult:"0x80004005(NS_ERROR_FAILURE)"同时使用Selenium Python保存大型html文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆