ul并非唯一的pandas.read_html的替代者? [英] Alternative to pandas.read_html where ulr is not unique?

查看:90
本文介绍了ul并非唯一的pandas.read_html的替代者?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用python 3.7.从"ERGEBNIS"部分的html表访问数据 问题在于,仅在单击提交后才显示下拉值的每个组合的结果.但是,这不会更改url,因此我不知道在更新下拉列表的输入值后如何访问结果表.

I want to access data from an html table from the section "ERGEBNIS" with python 3.7. The problem is, that the results for each combination of the drop down values are only shown after clicking on submit. This does however not change the url, so that I have no idea how I can access the results table after updating the input values of the drop downs.

这是我到目前为止所做的:

Here is what I've done so far:


from selenium import webdriver
from selenium.webdriver.support.ui import Select
import time

browser.get('https://daten.ktbl.de/feldarbeit/entry.html')

#Fix values of the drop down fields:

fertilizer = Select(browser.find_element_by_name("hgId"))
fertilizer.select_by_value("2") 

fertilizer = Select(browser.find_element_by_name("gId"))
fertilizer.select_by_value("193") 

fertilizer = Select(browser.find_element_by_name("avId"))
fertilizer.select_by_value("383")  

fertilizer = Select(browser.find_element_by_name("hofID"))
fertilizer.select_by_value("2") 

fertilizer = Select(browser.find_element_by_name("flaecheID"))
fertilizer.select_by_value("5") 

fertilizer= Select(browser.find_element_by_name("mengeID"))
fertilizer.select_by_value("60") 


# Submit changes to show the results of this particular combination of values

button = browser.find_element_by_xpath("//*[@type='submit']")
button.click()

但是,提交更改并不会更改url,因此我不知道如何访问结果表(此处为"ERGEBINS").

Submitting the changes does, however, not change the url, so that I don't know how I can access the results (here "ERGEBINS") table.

否则,我的方法是像这样使用pd.read_html:

Otherwise my approach would have been to use pd.read_html somehow like this:

...

url = browser.current_url
time.sleep(1)
df_list = pd.read_html(url, match = "Dieselbedarf")

但是由于每个结果的URL都不唯一,所以这没有任何意义. BeautifulSoup可能会遇到同样的问题,或者至少我不了解没有唯一的url怎么办..

But since the url isn't unique for each result, this doesn't make sense. Same issue would be with BeautifulSoup, or at least I don't understand how I can do it without a unique url..

有什么想法可以以其他方式访问html表吗?

Any ideas how I can access the html table otherwise?

@ bink1time的答案可以解决我的问题,该如何不使用URL而是通过原始HTML字符串访问表:

The answer of @bink1time could solve my problem how to access the table without the url, but via the raw HTML string:

html_source = browser.page_source
df_list = pd.read_html(html_source, match = "Dieselbedarf")

推荐答案

您可能只需获取html源:

You can probably just get the html source:

html_source = browser.page_source

根据文档: https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.read_html.html read_html采用URL,类似文件的对象或包含HTML的原始字符串. 在这种情况下,您传递原始字符串.

According to the docs: https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.read_html.html read_html takes a URL, a file-like object, or a raw string containing HTML. In this case you pass the raw string.

html_source = browser.page_source
df_list = pd.read_html(html_source, match = "Dieselbedarf")

只是一个便条,你不需要睡觉.

Just a note you don't need to sleep.

这篇关于ul并非唯一的pandas.read_html的替代者?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆