使用Python迭代时发生StaleElementException [英] StaleElementException when iterating with Python

查看:445
本文介绍了使用Python迭代时发生StaleElementException的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试为Amazon结果创建一个基本的Web抓取工具.在遍历结果时,有时会进入结果的第5页(有时只有第2页),然后抛出StaleElementException.抛出异常后,当我查看浏览器时,可以看到驱动程序/页面没有向下滚动到页码所在的位置(底部栏).

I'm trying to create a basic web scraper for Amazon results. As I'm iterating through results, I sometimes get to page 5 (sometimes only page 2) of the results and then a StaleElementException is thrown. When I look at the browser after the exception is thrown, I can see that the driver/page did not scroll down to where the page numbers are (bottom bar).

我的代码:

driver.get('https://www.amazon.com/s/ref=nb_sb_noss_1?url=search-alias%3Daps&field-keywords=sonicare+toothbrush')

for page in range(1,last_page_number +1):

    driver.implicitly_wait(10)

    bottom_bar = driver.find_element_by_class_name('pagnCur')
    driver.execute_script("arguments[0].scrollIntoView(true);", bottom_bar)

    current_page_number = int(driver.find_element_by_class_name('pagnCur').text)

    if page == current_page_number:
        next_page = driver.find_element_by_xpath('//div[@id="pagn"]/span[@class="pagnLink"]/a[text()="{0}"]'.format(current_page_number+1))
        next_page.click()
        print('page #',page,': going to next page')
    else:
        print('page #: ', page,'error')

我已经看过了这个问题,并且我猜想可以应用类似的修复程序,但是我不确定如何在页面上找到消失的内容.另外,根据打印语句的执行速度,我可以看到implicitly_wait(10)实际上并没有等待整整10秒钟.

I've looked at this question, and I'm guessing that a similar fix can be applied, but I'm not sure how to find something on the page that disappears. Also, based on how quickly the print statements are occurring, I can see that the implicitly_wait(10) isn't actually waiting a full 10 seconds.

异常指向以"driver.execute_script"开头的行.例外:

The exception is pointing to the line that starts with "driver.execute_script". This is the exception:

StaleElementReferenceException: Message: The element reference of <span class="pagnCur"> is stale; either the element is no longer attached to the DOM, it is not in the current frame context, or the document has been refreshed

有时我会遇到ValueError:

Sometimes I'll get a ValueError:

ValueError: invalid literal for int() with base 10: ''

因此这些错误/异常使我相信等待页面完全刷新时会发生某些事情.

So these errors/exceptions lead me to believe that there is something going on with waiting for the page to refresh completely.

推荐答案

此错误消息...

StaleElementReferenceException: Message: The element reference of <span class="pagnCur"> is stale; either the element is no longer attached to the DOM, it is not in the current frame context, or the document has been refreshed

...表示该元素的先前引用现在已过时,并且该元素引用不再存在于页面的DOM中.

...implies that the previous reference of the element is now stale and the element reference is no longer present on the DOM of the page.

此问题背后的常见原因是:

The common reasons behind this this issue are:

  • 该元素在HTML中的位置已更改.
  • 该元素不再附加到DOM TREE.
  • 元素所在的网页已刷新.
  • 元素的先前实例已由 JavaScript AjaxCall 刷新.
  • The element have changed position within the HTML.
  • The element is no longer attached to the DOM TREE.
  • The webpage on which the element was part of has been refreshed.
  • The previous instance of element has been refreshed by a JavaScript or an AjaxCall.

通过scrollIntoView() printing 几个有用的调试消息保留了 scrolling 的概念,我作了一些细微调整,使 WebDriverWait 您可以使用以下解决方案:

Preserving your concept of scrolling through scrollIntoView() and printing a couple of helpful debug messages, I have made some minor adjustments inducing WebDriverWait and you can use the following solution:

  • 代码块:

  • Code Block:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

options = Options()
options.add_argument("start-maximized")
options.add_argument('disable-infobars')
options.add_argument("--disable-extensions")
driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
driver.get("https://www.amazon.com/s/ref=nb_sb_noss_1?url=search-alias%3Daps&field-keywords=sonicare+toothbrush")
while True:
    try:
        current_page_number_element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "span.pagnCur")))
        driver.execute_script("arguments[0].scrollIntoView(true);", current_page_number_element)
        current_page_number = current_page_number_element.get_attribute("innerHTML")
        WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "span.pagnNextArrow"))).click()
        print("page # {} : going to next page".format(current_page_number))
    except:
        print("page # {} : error, no more pages".format(current_page_number))
        break
driver.quit()

  • 控制台输出:

  • Console Output:

    page # 1 : going to next page
    page # 2 : going to next page
    page # 3 : going to next page
    page # 4 : going to next page
    page # 5 : going to next page
    page # 6 : going to next page
    page # 7 : going to next page
    page # 8 : going to next page
    page # 9 : going to next page
    page # 10 : going to next page
    page # 11 : going to next page
    page # 12 : going to next page
    page # 13 : going to next page
    page # 14 : going to next page
    page # 15 : going to next page
    page # 16 : going to next page
    page # 17 : going to next page
    page # 18 : going to next page
    page # 19 : going to next page
    page # 20 : error, no more pages
    

  • 这篇关于使用Python迭代时发生StaleElementException的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆