合并通过网页抓取获取的数据框 [英] Merging Dataframes that was obtained via web scraping
问题描述
我有一个代码可以从网站上刮取表格,并将其读入熊猫Dataframe。但是,由于网站的设计方式,这是通过循环的完成的。因此,表格全部标有相同的
名称
ie:它们被标记在 df
name
I have a code that scrapes tables from a website, and reads it into pandas Dataframe. However, this is done through a for
loop because of how the website has been designed. As such, the tables are all tagged with the same name
ie: they are tagged under df
name
代码
soup = bs4.BeautifulSoup(driver.page_source, "html.parser")
for thead in soup.select(".data-point-container table thead"):
tbody = thead.find_next_sibling("tbody")
table = "<table>%s</table>" % (str(thead) + str(tbody))
df = pandas.read_html(str(table))[0]
print(df)
print('-------------')
结果
Table1 FY2012 FY2013 FY2014 FY2015 Last 12 Months
0 item1 value1 value2 value3 value4 value5
1 item2 value1 value2 value3 value4 value5
2 item3 value1 value2 value3 value4 value5
3 item4 value1 value2 value3 value4 value5
4 item5 value1 value2 value3 value4 value5
5 item6 value1 value2 value3 value4 value5
-------------
Table2 FY2012 FY2013 FY2014 FY2015 Last 12 Months
0 item1 value1 value2 value3 value4 value5
1 item2 value1 value2 value3 value4 value5
2 item3 value1 value2 value3 value4 value5
3 item4 value1 value2 value3 value4 value5
-------------
Table3 FY2012 FY2013 FY2014 FY2015 Last 12 Months
0 item1 value1 value2 value3 value4 value5
1 item2 value1 value2 value3 value4 value5
2 item3 value1 value2 value3 value4 value5
3 item4 value1 value2 value3 value4 value5
4 item5 value1 value2 value3 value4 value5
5 item6 value1 value2 value3 value4 value5
-------------
Table4 FY2012 FY2013 FY2014 FY2015 Last 12 Months
0 item1 value1 value2 value3 value4 value5
1 item2 value1 value2 value3 value4 value5
2 item3 value1 value2 value3 value4 value5
3 item4 value1 value2 value3 value4 value5
4 item5 value1 value2 value3 value4 value5
5 item6 value1 value2 value3 value4 value5
6 item7 value1 value2 value3 value4 value5
7 item8 value1 value2 value3 value4 value5
有没有办法让我连线/将所有数据帧合并成一个数据帧?
Is there a way for me to concat / merge all Dataframes together into just 1 Dataframe?
推荐答案
如果您需要做的是合并一些DataFrames,您可以简单地将它们收集到列表中,然后合并使用 pd.concat 。
If all you need to do is merge a number of DataFrames, you can simply collect them in a list and then merge them using pd.concat.
这样的东西应该可以工作:
Something like this should work:
dataframes = []
for thread in soup.select(...):
#your scraper logic here
df = pandas.read_html(...)
dataframes.append(df)
pd.concat(dataframes)
有帮助吗?
这篇关于合并通过网页抓取获取的数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!