WebDriverException:消息:异常..."Failure" nsresult:"0x80004005(NS_ERROR_FAILURE)"同时使用Selenium Python保存大型html文件 [英] WebDriverException: Message: Exception... "Failure" nsresult: "0x80004005 (NS_ERROR_FAILURE)" while saving a large html file using Selenium Python
问题描述
我正在滚动浏览Google Play商店和应用的评论,这些评论由应用页面的URL指定.然后,Selenium找到评论并向下滚动以加载所有评论.滚动部分有效,没有无头选项,我可以看到Selenium到达站点的末端.无法正常工作的是保存html内容以进行进一步分析.
I'm scrolling through the Google Play Store and the reviews for an app, specified by the URL to the app page. Selenium then finds the reviews and scrolls down to load all reviews. The scrolling part works, without the headless option I can watch Selenium reaching the end of the site. What's not working is saving the html content for further analysis.
基于其他答案,我尝试了其他方法来保存源代码.
Based on other answers I tried different methods for saving the source code.
innerHTML = DRIVER.execute_script("return document.body.innerHTML")
或
innerHTML = DRIVER.page_source
两者都会导致相同的错误消息和异常.
Both result in the same error message and exception.
我的用于滚动浏览页面并加载所有评论的代码
My code for scrolling through the page and loading all reviews
SCROLL_PAUSE_TIME = 5
options = Options()
options.headless = True
FP = webdriver.FirefoxProfile()
FP.set_preference("intl.accept_languages", "de")
for url in START_URLS:
try:
DRIVER = webdriver.Firefox(options=options, firefox_profile=FP)
DRIVER.get(url)
time.sleep(SCROLL_PAUSE_TIME)
app_name = DRIVER.find_element_by_xpath('//h1[@itemprop="name"]').get_attribute('innerText')
all_reviews_button = DRIVER.find_element_by_xpath('//span[text()="Alle Bewertungen lesen"]')
all_reviews_button.click()
time.sleep(SCROLL_PAUSE_TIME)
last_height = DRIVER.execute_script("return document.body.scrollHeight")
while True:
DRIVER.execute_script("window.scrollTo(0, document.body.scrollHeight);")
try:
DRIVER.find_element_by_xpath('//span[text()="Mehr anzeigen"]').click()
except:
pass
time.sleep(SCROLL_PAUSE_TIME)
new_height = DRIVER.execute_script("return document.body.scrollHeight")
if new_height == last_height:
logger.info('Durchlauf erfolgreich')
innerHTML = DRIVER.execute_script("return document.body.innerHTML")
with open(app_name +'.html','w', encoding='utf-8') as out:
out.write(html)
break
last_height = new_height
except Exception as e:
logger.error('Exception occurred', exc_info=True)
finally:
DRIVER.quit()
日志文件,显示无穷大滚动到达页面末尾,但无法保存文件
the log file, showing that the infinity scroll reached the end of the page but couldn't save the file
10.09.19 16:12:00 - INFO - Durchlauf erfolgreich
10.09.19 16:12:13 - ERROR - Exception occurred
Traceback (most recent call last):
File "scraper.py", line 57, in <module>
innerHTML = DRIVER.execute_script("return document.body.innerHTML")
File "C:\Users\tenscher\AppData\Local\Programs\Python\Python36\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 636, in execute_script
'args': converted_args})['value']
File "C:\Users\tenscher\AppData\Local\Programs\Python\Python36\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "C:\Users\tenscher\AppData\Local\Programs\Python\Python36\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: [Exception... "Failure" nsresult: "0x80004005 (NS_ERROR_FAILURE)" location: "JS frame :: chrome://marionette/content/proxy.js :: sendReply_ :: line 275" data: no]
geckodriver.log的最后一部分
last part of the geckodriver.log
...
1568124670155 Marionette WARN TimedPromise timed out after 500 ms: stacktrace:
bail@chrome://marionette/content/sync.js:223:64
1568124693017 Marionette WARN TimedPromise timed out after 500 ms: stacktrace:
bail@chrome://marionette/content/sync.js:223:64
1568124734637 Marionette INFO Stopped listening on port 57015
[Parent 14684, Gecko_IOThread] WARNING: pipe error: 109: file z:/task_1560820494/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 341
[Child 10464, Chrome_ChildThread] WARNING: pipe error: 109: file z:/task_1560820494/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 341
[Parent 14684, Gecko_IOThread] WARNING: pipe error: 109: file z:/task_1560820494/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 341
JavaScript error: resource:///modules/sessionstore/SessionStore.jsm, line 1639: TypeError: subject.QueryInterface is not a function
A content process crashed and MOZ_CRASHREPORTER_SHUTDOWN is set, shutting down
[Child 2508, Chrome_ChildThread] WARNING: pipe error: 109: file z:/task_1560820494/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 341
[Child]
我想将页面另存为文件,并在下一步中解析html以提取评论.但是,保存部分无法处理较大的页面.如果我说了100步后退出了while循环并保存了页面,那么它将正常工作.
I'd like to save the page as a file and in the next step parse the html to extract the reviews. However the saving part is not working with a large page. If I exit the while loop after say 100 steps and save the page it works fine.
推荐答案
NS_ERROR_FAILURE(0x80004005)
这是所有错误的一般性错误,对于所有不适用更特定错误代码的错误都会发生.
NS_ERROR_FAILURE (0x80004005)
This is the generic error of all the errors and occurs for all errors for which a more specific error code does not apply.
但是此错误消息...
However this error message...
selenium.common.exceptions.WebDriverException: Message: [Exception... "Failure" nsresult: "0x80004005 (NS_ERROR_FAILURE)" location: "JS frame :: chrome://marionette/content/proxy.js :: sendReply_ :: line 275" data: no]
...表示牵线木偶在尝试读取/存储/复制page_source()
时抛出了错误.
...implies that the Marionette threw an error while attempting to read/store/copy the page_source()
.
相关的 HTML DOM 可以帮助我们更好地调试问题道路.但是,似乎问题在于page_source()
确实很大/很大,超过了牵线木偶可以处理的最大值.可能是您要处理的string
更大.
The relevant HTML DOM would have helped us to debug the issue in a better way. However it seems the issue is with the fact that the page_source()
is emencely huge/large which exceeds the max value of the max value Marionette can handle. Possibly it's a much bigger string
you're dealing with.
一种快速的解决方案是避免将page_source()
传递给变量并将其打印出来以找出实际问题所在.
A quick solution will be to avoid passing the page_source()
to the variable and print it to find out where the actual issue lies.
print(DRIVER.execute_script("return document.body.innerHTML"))
或
print(DRIVER.page_source)
参考
您可以在以下位置找到一些相关的讨论
Reference
You can find a couple of relevant discussion in:
文档链接:
- WebDriver:TakeScreenshot在网页高度较大时会产生错误
- WebDriver:TakeScreenshot在画布"scale()"中无法显示大型网页
- 如果宽度或高度大于32767,则ctx.scale()中的NS_ERROR_FAILURE异常
- event.synthesizeMouseAtPoint()仅在存在以下情况时才调用nsIDOMWindowUtils.sendMouseEvent():有效的窗口句柄
- WebDriver:TakeScreenshot generates error when web page has a big height
- WebDriver:TakeScreenshot fails in canvas "scale()" for huge web pages
- Exception NS_ERROR_FAILURE in ctx.scale() if width or height is greater than 32767
- event.synthesizeMouseAtPoint() should only call nsIDOMWindowUtils.sendMouseEvent() if there is a valid window handle
这篇关于WebDriverException:消息:异常..."Failure" nsresult:"0x80004005(NS_ERROR_FAILURE)"同时使用Selenium Python保存大型html文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!