使用 Python 进行迭代时出现 StaleElementException [英] StaleElementException when iterating with Python

查看:21
本文介绍了使用 Python 进行迭代时出现 StaleElementException的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试为 Amazon 结果创建一个基本的网络抓取工具.当我遍历结果时,有时会到达结果的第 5 页(有时只有第 2 页),然后抛出 StaleElementException.抛出异常后查看浏览器,可以看到驱动程序/页面没有向下滚动到页码所在的位置(底部栏).

I'm trying to create a basic web scraper for Amazon results. As I'm iterating through results, I sometimes get to page 5 (sometimes only page 2) of the results and then a StaleElementException is thrown. When I look at the browser after the exception is thrown, I can see that the driver/page did not scroll down to where the page numbers are (bottom bar).

我的代码:

driver.get('https://www.amazon.com/s/ref=nb_sb_noss_1?url=search-alias%3Daps&field-keywords=sonicare+toothbrush')

for page in range(1,last_page_number +1):

    driver.implicitly_wait(10)

    bottom_bar = driver.find_element_by_class_name('pagnCur')
    driver.execute_script("arguments[0].scrollIntoView(true);", bottom_bar)

    current_page_number = int(driver.find_element_by_class_name('pagnCur').text)

    if page == current_page_number:
        next_page = driver.find_element_by_xpath('//div[@id="pagn"]/span[@class="pagnLink"]/a[text()="{0}"]'.format(current_page_number+1))
        next_page.click()
        print('page #',page,': going to next page')
    else:
        print('page #: ', page,'error')

我看过这个问题,我猜可以应用类似的修复程序,但我不确定如何在页面上找到消失的内容.此外,根据打印语句的发生速度,我可以看到 implicitly_wait(10) 实际上并没有等待整整 10 秒.

I've looked at this question, and I'm guessing that a similar fix can be applied, but I'm not sure how to find something on the page that disappears. Also, based on how quickly the print statements are occurring, I can see that the implicitly_wait(10) isn't actually waiting a full 10 seconds.

异常指向以driver.execute_script"开头的行.这是个例外:

The exception is pointing to the line that starts with "driver.execute_script". This is the exception:

StaleElementReferenceException: Message: The element reference of <span class="pagnCur"> is stale; either the element is no longer attached to the DOM, it is not in the current frame context, or the document has been refreshed

有时我会得到一个 ValueError:

Sometimes I'll get a ValueError:

ValueError: invalid literal for int() with base 10: ''

所以这些错误/异常让我相信等待页面完全刷新是有原因的.

So these errors/exceptions lead me to believe that there is something going on with waiting for the page to refresh completely.

推荐答案

此错误信息...

StaleElementReferenceException: Message: The element reference of <span class="pagnCur"> is stale; either the element is no longer attached to the DOM, it is not in the current frame context, or the document has been refreshed

...意味着元素的先前引用现在已经过时并且元素引用不再存在于页面的 DOM 上.

...implies that the previous reference of the element is now stale and the element reference is no longer present on the DOM of the page.

此问题背后的常见原因是:

The common reasons behind this this issue are:

  • 该元素在 HTML 中的位置发生了变化.
  • 该元素不再附加到 DOM TREE.
  • 该元素所在的网页已刷新.
  • 元素的前一个实例已被 JavaScriptAjaxCall 刷新.

通过 scrollIntoView()打印 几个有用的调试消息保留 滚动 的概念,我做了一些细微的调整,包括 WebDriverWait,您可以使用以下解决方案:

Preserving your concept of scrolling through scrollIntoView() and printing a couple of helpful debug messages, I have made some minor adjustments inducing WebDriverWait and you can use the following solution:

  • 代码块:

  • Code Block:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

options = Options()
options.add_argument("start-maximized")
options.add_argument('disable-infobars')
options.add_argument("--disable-extensions")
driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:UtilityBrowserDriverschromedriver.exe')
driver.get("https://www.amazon.com/s/ref=nb_sb_noss_1?url=search-alias%3Daps&field-keywords=sonicare+toothbrush")
while True:
    try:
        current_page_number_element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "span.pagnCur")))
        driver.execute_script("arguments[0].scrollIntoView(true);", current_page_number_element)
        current_page_number = current_page_number_element.get_attribute("innerHTML")
        WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "span.pagnNextArrow"))).click()
        print("page # {} : going to next page".format(current_page_number))
    except:
        print("page # {} : error, no more pages".format(current_page_number))
        break
driver.quit()

  • 控制台输出:

  • Console Output:

    page # 1 : going to next page
    page # 2 : going to next page
    page # 3 : going to next page
    page # 4 : going to next page
    page # 5 : going to next page
    page # 6 : going to next page
    page # 7 : going to next page
    page # 8 : going to next page
    page # 9 : going to next page
    page # 10 : going to next page
    page # 11 : going to next page
    page # 12 : going to next page
    page # 13 : going to next page
    page # 14 : going to next page
    page # 15 : going to next page
    page # 16 : going to next page
    page # 17 : going to next page
    page # 18 : going to next page
    page # 19 : going to next page
    page # 20 : error, no more pages
    

  • 这篇关于使用 Python 进行迭代时出现 StaleElementException的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆