合并通过网页抓取获取的数据框 [英] Merging Dataframes that was obtained via web scraping

查看:149
本文介绍了合并通过网页抓取获取的数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个代码可以从网站上刮取表格,并将其读入熊猫Dataframe。但是,由于网站的设计方式,这是通过循环的完成的。因此,表格全部标有相同的名称 ie:它们被标记在 df name

I have a code that scrapes tables from a website, and reads it into pandas Dataframe. However, this is done through a for loop because of how the website has been designed. As such, the tables are all tagged with the same name ie: they are tagged under df name

代码

soup = bs4.BeautifulSoup(driver.page_source, "html.parser")
    for thead in soup.select(".data-point-container table thead"):
        tbody = thead.find_next_sibling("tbody")

        table = "<table>%s</table>" % (str(thead) + str(tbody))

        df = pandas.read_html(str(table))[0]

        print(df)
        print('-------------')

结果

     Table1   FY2012   FY2013   FY2014   FY2015   Last 12 Months
0    item1    value1   value2   value3   value4   value5
1    item2    value1   value2   value3   value4   value5
2    item3    value1   value2   value3   value4   value5
3    item4    value1   value2   value3   value4   value5
4    item5    value1   value2   value3   value4   value5
5    item6    value1   value2   value3   value4   value5
-------------

     Table2   FY2012   FY2013   FY2014   FY2015   Last 12 Months
0    item1    value1   value2   value3   value4   value5
1    item2    value1   value2   value3   value4   value5
2    item3    value1   value2   value3   value4   value5
3    item4    value1   value2   value3   value4   value5
-------------

     Table3   FY2012   FY2013   FY2014   FY2015   Last 12 Months
0    item1    value1   value2   value3   value4   value5
1    item2    value1   value2   value3   value4   value5
2    item3    value1   value2   value3   value4   value5
3    item4    value1   value2   value3   value4   value5
4    item5    value1   value2   value3   value4   value5
5    item6    value1   value2   value3   value4   value5
-------------

     Table4   FY2012   FY2013   FY2014   FY2015   Last 12 Months
0    item1    value1   value2   value3   value4   value5
1    item2    value1   value2   value3   value4   value5
2    item3    value1   value2   value3   value4   value5
3    item4    value1   value2   value3   value4   value5
4    item5    value1   value2   value3   value4   value5
5    item6    value1   value2   value3   value4   value5
6    item7    value1   value2   value3   value4   value5
7    item8    value1   value2   value3   value4   value5

有没有办法让我连线/将所有数据帧合并成一个数据帧?

Is there a way for me to concat / merge all Dataframes together into just 1 Dataframe?

推荐答案

如果您需要做的是合并一些DataFrames,您可以简单地将它们收集到列表中,然后合并使用 pd.concat

If all you need to do is merge a number of DataFrames, you can simply collect them in a list and then merge them using pd.concat.

这样的东西应该可以工作:

Something like this should work:

dataframes = []

for thread in soup.select(...):

    #your scraper logic here

    df = pandas.read_html(...)
    dataframes.append(df)

pd.concat(dataframes)

有帮助吗?

这篇关于合并通过网页抓取获取的数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆