使用Selenium(python)从html表中获取数据:提交更改会中断循环 [英] Getting data from html table with selenium (python): Submitting changes breaks loop

查看:209
本文介绍了使用Selenium(python)从html表中获取数据:提交更改会中断循环的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想通过循环遍历组合的HTML表格从下拉列表的不同组合中抓取数据.选择组合后,需要提交更改.但是,由于刷新页面会导致错误.

I want to scrape data from an HTML table for different combinations of drop down values via looping over those combinations. After a combination is chosen, the changes need to be submitted. This is, however, causing an error since it refreshes the page.

这就是我到目前为止所做的:

This it what I've done so far:

from selenium import webdriver
from selenium.webdriver.support.ui import Select
import time

browser.get('https://daten.ktbl.de/feldarbeit/entry.html')

# Selecting the constant values of some of the drop downs:
fertilizer = Select(browser.find_element_by_name("hgId"))
fertilizer.select_by_value("2") 
fertilizer = Select(browser.find_element_by_name("gId"))
fertilizer.select_by_value("193") 
fertilizer = Select(browser.find_element_by_name("avId"))
fertilizer.select_by_value("383")  
fertilizer = Select(browser.find_element_by_name("hofID"))
fertilizer.select_by_value("2") 

# Looping over different combinations of plot size and amount of fertilizer:
size = Select(browser.find_element_by_name("flaecheID"))
for size_values in size.options:
    size.select_by_value(size_values.get_attribute("value"))
    time.sleep(1)

    amount= Select(browser.find_element_by_name("mengeID"))
    for amount_values in amount.options:
        amount.select_by_value(amount_values.get_attribute("value"))
        time.sleep(1)

        #Refreshing the page after the two variable values are chosen:
        button = browser.find_element_by_xpath("//*[@type='submit']")
        button.click()
        time.sleep(5)

这将导致错误:selenium.common.exceptions.StaleElementReferenceException: Message: The element reference of <option> is stale; either the element is no longer attached to the DOM, it is not in the current frame context, or the document has been refreshed. 显然,问题是我确实刷新了文档.

This leads to the error:selenium.common.exceptions.StaleElementReferenceException: Message: The element reference of <option> is stale; either the element is no longer attached to the DOM, it is not in the current frame context, or the document has been refreshed. Obviously the issue is that I did indeed refresh the document.

提交更改并且页面已加载结果后,我想使用以下方法检索它们:

After submitting the changes and the page has loaded the results, I want to retrieve the them with:

html_source = browser.page_source
df_list = pd.read_html(html_source, match = "Dieselbedarf")

(向@ bink1time喊话,谁回答了我的问题的这一部分

(Shout-out to @bink1time who answered this part of my question here).

如何在不中断循环的情况下更新页面?

How can I update the page without breaking the loop?

非常感谢您的帮助!

推荐答案

.

为了避免这种情况,总是尝试在交互之前搜索元素.在您的特定情况下,您搜索了sizeamount,找到了它们并将它们存储在变量中.但是,一旦刷新,它们的UUID就会更改,因此您存储的旧UUID将不再附加到DOM.当尝试与它们进行交互时,Selenium无法在DOM中找到它们并引发此异常.

In order to avoid it, always try to search for an element before an interaction. In your particular case, you searched for size and amount, found them and stored them in variables. But then, upon refresh, their UUID changed, so old ones that you have stored are no longer attached to the DOM. When trying to interact with them, Selenium cannot find them in the DOM and throws this exception.

我修改了您的代码,以便始终在交互之前重新搜索大小和数量元素:

I modified your code to always re-search size and amount elements before the interaction:

# Looping over different combinations of plot size and amount of fertilizer:
size = Select(browser.find_element_by_name("flaecheID"))
for i in range(len(size.options)):
    # Search and save new select element
    size = Select(browser.find_element_by_name("flaecheID"))
    size.select_by_value(size.options[i].get_attribute("value"))
    time.sleep(1)

    amount = Select(browser.find_element_by_name("mengeID"))
    for j in range(len(amount.options)):
        # Search and save new select element
        amount = Select(browser.find_element_by_name("mengeID"))
        amount.select_by_value(amount.options[j].get_attribute("value"))
        time.sleep(1)

        #Refreshing the page after the two variable values are chosen:
        button = browser.find_element_by_xpath("//*[@type='submit']")
        button.click()
        time.sleep(5)

尝试一下?它为我工作.希望对您有所帮助.

Try this? It worked for me. I hope it helps.

这篇关于使用Selenium(python)从html表中获取数据:提交更改会中断循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆