使用Python迭代时发生StaleElementException [英] StaleElementException when iterating with Python
问题描述
我正在尝试为Amazon结果创建一个基本的Web抓取工具.在遍历结果时,有时会进入结果的第5页(有时只有第2页),然后抛出StaleElementException
.抛出异常后,当我查看浏览器时,可以看到驱动程序/页面没有向下滚动到页码所在的位置(底部栏).
I'm trying to create a basic web scraper for Amazon results. As I'm iterating through results, I sometimes get to page 5 (sometimes only page 2) of the results and then a StaleElementException
is thrown. When I look at the browser after the exception is thrown, I can see that the driver/page did not scroll down to where the page numbers are (bottom bar).
我的代码:
driver.get('https://www.amazon.com/s/ref=nb_sb_noss_1?url=search-alias%3Daps&field-keywords=sonicare+toothbrush')
for page in range(1,last_page_number +1):
driver.implicitly_wait(10)
bottom_bar = driver.find_element_by_class_name('pagnCur')
driver.execute_script("arguments[0].scrollIntoView(true);", bottom_bar)
current_page_number = int(driver.find_element_by_class_name('pagnCur').text)
if page == current_page_number:
next_page = driver.find_element_by_xpath('//div[@id="pagn"]/span[@class="pagnLink"]/a[text()="{0}"]'.format(current_page_number+1))
next_page.click()
print('page #',page,': going to next page')
else:
print('page #: ', page,'error')
我已经看过了这个问题,并且我猜想可以应用类似的修复程序,但是我不确定如何在页面上找到消失的内容.另外,根据打印语句的执行速度,我可以看到implicitly_wait(10)
实际上并没有等待整整10秒钟.
I've looked at this question, and I'm guessing that a similar fix can be applied, but I'm not sure how to find something on the page that disappears. Also, based on how quickly the print statements are occurring, I can see that the implicitly_wait(10)
isn't actually waiting a full 10 seconds.
异常指向以"driver.execute_script"开头的行.例外:
The exception is pointing to the line that starts with "driver.execute_script". This is the exception:
StaleElementReferenceException: Message: The element reference of <span class="pagnCur"> is stale; either the element is no longer attached to the DOM, it is not in the current frame context, or the document has been refreshed
有时我会遇到ValueError:
Sometimes I'll get a ValueError:
ValueError: invalid literal for int() with base 10: ''
因此这些错误/异常使我相信等待页面完全刷新时会发生某些事情.
So these errors/exceptions lead me to believe that there is something going on with waiting for the page to refresh completely.
推荐答案
此错误消息...
StaleElementReferenceException: Message: The element reference of <span class="pagnCur"> is stale; either the element is no longer attached to the DOM, it is not in the current frame context, or the document has been refreshed
...表示该元素的先前引用现在已过时,并且该元素引用不再存在于页面的DOM中.
...implies that the previous reference of the element is now stale and the element reference is no longer present on the DOM of the page.
此问题背后的常见原因是:
The common reasons behind this this issue are:
- 该元素在HTML中的位置已更改.
- 该元素不再附加到DOM TREE.
- 元素所在的网页已刷新.
- 元素的先前实例已由 JavaScript 或 AjaxCall 刷新.
- The element have changed position within the HTML.
- The element is no longer attached to the DOM TREE.
- The webpage on which the element was part of has been refreshed.
- The previous instance of element has been refreshed by a JavaScript or an AjaxCall.
通过scrollIntoView()
和 printing 几个有用的调试消息保留了 scrolling 的概念,我作了一些细微调整,使 WebDriverWait 您可以使用以下解决方案:
Preserving your concept of scrolling through scrollIntoView()
and printing a couple of helpful debug messages, I have made some minor adjustments inducing WebDriverWait and you can use the following solution:
-
代码块:
Code Block:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
options = Options()
options.add_argument("start-maximized")
options.add_argument('disable-infobars')
options.add_argument("--disable-extensions")
driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
driver.get("https://www.amazon.com/s/ref=nb_sb_noss_1?url=search-alias%3Daps&field-keywords=sonicare+toothbrush")
while True:
try:
current_page_number_element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "span.pagnCur")))
driver.execute_script("arguments[0].scrollIntoView(true);", current_page_number_element)
current_page_number = current_page_number_element.get_attribute("innerHTML")
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "span.pagnNextArrow"))).click()
print("page # {} : going to next page".format(current_page_number))
except:
print("page # {} : error, no more pages".format(current_page_number))
break
driver.quit()
控制台输出:
Console Output:
page # 1 : going to next page
page # 2 : going to next page
page # 3 : going to next page
page # 4 : going to next page
page # 5 : going to next page
page # 6 : going to next page
page # 7 : going to next page
page # 8 : going to next page
page # 9 : going to next page
page # 10 : going to next page
page # 11 : going to next page
page # 12 : going to next page
page # 13 : going to next page
page # 14 : going to next page
page # 15 : going to next page
page # 16 : going to next page
page # 17 : going to next page
page # 18 : going to next page
page # 19 : going to next page
page # 20 : error, no more pages
这篇关于使用Python迭代时发生StaleElementException的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!