pandas read_html-找不到表格 [英] pandas read_html - no tables found
问题描述
我正在尝试查看是否可以从WU.com读取数据表,但是由于找不到任何表而收到类型错误. (这里也是Web抓取的第一个计时器)还有另一个人,他的stackoverflow问题非常相似
I am attempting to see if I can read a table of data from WU.com, but I am getting a type error for no tables found. (first timer on web scrapping too here) There is also another person with a very similar stackoverflow question here with WU table of data, but the solution is a little bit complicated to me.
import pandas as pd
df_list = pd.read_html('https://www.wunderground.com/history/daily/us/wi/milwaukee/KMKE/date/2013-6-26')
print(df_list)
在历史网页上密尔沃基的数据,这是我尝试检索到熊猫中的数据表(daily observations
):
On the webpage of historical data for Milwaukee, this is the table of data (daily observations
) that I am attempting to retrieve into Pandas:
任何提示都会有所帮助,谢谢.
Any tips help, thank you.
推荐答案
页面是动态的,这意味着您需要首先呈现页面.因此,您需要使用Selenium之类的东西来渲染页面,然后可以使用pandas .read_html()
:
the page is dynamic which means you'll need to to render the page first. So you would need to use something like Selenium to render the page, then you can pull the table using pandas .read_html()
:
from selenium import webdriver
import pandas as pd
driver = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')
driver.get("https://www.wunderground.com/history/daily/us/wi/milwaukee/KMKE/date/2013-6-26")
html = driver.page_source
tables = pd.read_html(html)
data = tables[1]
driver.close()
输出:
print (data)
Time Temperature ... Precip Accum Condition
0 6:52 PM 68 F ... 0.0 in Mostly Cloudy
1 7:52 PM 69 F ... 0.0 in Mostly Cloudy
2 8:52 PM 70 F ... 0.0 in Mostly Cloudy
3 9:52 PM 67 F ... 0.0 in Cloudy
4 10:52 PM 65 F ... 0.0 in Partly Cloudy
5 11:42 PM 66 F ... 0.0 in Mostly Cloudy
6 11:52 PM 68 F ... 0.0 in Mostly Cloudy
7 12:08 AM 68 F ... 0.0 in Cloudy
8 12:52 AM 68 F ... 0.0 in Mostly Cloudy
9 1:52 AM 70 F ... 0.0 in Cloudy
10 2:13 AM 70 F ... 0.0 in Cloudy
11 2:52 AM 71 F ... 0.0 in Cloudy
12 3:52 AM 70 F ... 0.0 in Mostly Cloudy
13 4:19 AM 70 F ... 0.0 in Cloudy
14 4:29 AM 70 F ... 0.0 in Cloudy
15 4:52 AM 70 F ... 0.0 in Cloudy
16 5:25 AM 70 F ... 0.0 in Mostly Cloudy
17 5:52 AM 71 F ... 0.0 in Cloudy
18 6:52 AM 73 F ... 0.0 in Cloudy
19 7:52 AM 74 F ... 0.0 in Cloudy
20 8:52 AM 73 F ... 0.0 in Cloudy
21 9:52 AM 71 F ... 0.0 in Cloudy
22 10:52 AM 71 F ... 0.0 in Cloudy
23 11:52 AM 70 F ... 0.0 in Cloudy
24 12:52 PM 72 F ... 0.0 in Mostly Cloudy
25 1:52 PM 70 F ... 0.0 in Mostly Cloudy
26 2:52 PM 71 F ... 0.0 in Mostly Cloudy
27 3:52 PM 71 F ... 0.0 in Partly Cloudy
28 4:52 PM 68 F ... 0.0 in Mostly Cloudy
29 5:52 PM 66 F ... 0.0 in Mostly Cloudy
[30 rows x 11 columns]
这篇关于 pandas read_html-找不到表格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!