合并具有非唯一索引的多个 pandas 数据集 [英] Merging multiple pandas datasets with non-unique index

查看:74
本文介绍了合并具有非唯一索引的多个 pandas 数据集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在字典中存储了几个结构类似的熊猫数据框.我通过以下方式访问数据框.

I have several similarly structured pandas dataframes stored in a dictionary. I access a dataframe in the following way.

ex_dict[df1]
date        df1price1   df1price2
10-20-2015     100         150
10-21-2015      90         100

我想按日期将所有这些数据框合并为一个数据框.日期是重叠的,但并非所有数据框都包含所有日期.

I want to merge all of these dataframes into one dataframe by date. The dates are overlapping, but not all dataframes include all dates.

我需要从这里出发

df1
date        df1price1   df1price2
10-20-2015     100         150
10-21-2015      90         100
10-22-2015     100         140

df2
date        df2price1   df2price2
10-20-2015     110         140
10-21-2015      90         110
10-23-2015     110         120

df3
date        df3price1   df3price2
10-20-2015     100         150
10-22-2015      90         100
10-23-2015      80         130

对此:

df_all
date        df1price1   df1price2 ... df3price1   df3price2
10-20-2015     100         150    ...    100         150
10-21-2015      90         100    ...    NaN         NaN
10-22-2015     100         140    ...     90         100
10-23-2015     NaN         NaN    ...     80         130

我已经尝试了很多方法,但是我无法使其工作,因为一次不能重复合并2以创建一个新的数据框,然后再合并到该数据框上.我需要合并的数据帧数量在4到10之间变化,因此我需要一种自动执行此操作的方法(因此,我认为通过dict可能会起作用).

I've tried lots of things, but I can't get it to work, short of repeatedly merging 2 at a time to create a new dataframe and then remerging onto that. The number of dataframes I need to merge varies between 4 and 10, so I need a way to do this automatically (hence why I thought a passing a dict might work).

对此,任何帮助将不胜感激.

Any help on this would be incredibly appreciated.

推荐答案

您可以使用concat后跟groupby('date')来平坦化结果.

You can use a concat followed by a groupby('date') to flatten the result.

In [22]: pd.concat([df1,df2,df3]).groupby('date').max()
Out[22]:
            df1price1  df1price2  df2price1  df2price2  df3price1  df3price2
date
10-20-2015        100        150        110        140        100        150
10-21-2015         90        100         90        110        NaN        NaN
10-22-2015        100        140        NaN        NaN         90        100
10-23-2015        NaN        NaN        110        120         80        130

正如BrenBarn在注释中指出的那样,如果将连接列设置为数据帧的索引,则可以使用concat(axis=1):

As BrenBarn points out in the comments, you can use concat(axis=1) if you set the join column as the index of your dataframes:

df1.index = df1.date
df2.index = df2.date
df3.index = df3.date

In [44]: pd.concat([df1,df2,df3],axis=1)
Out[44]:
                  date  df1price1  df1price2        date  df2price1  \
10-20-2015  10-20-2015        100        150  10-20-2015        110
10-21-2015  10-21-2015         90        100  10-21-2015         90
10-22-2015  10-22-2015        100        140         NaN        NaN
10-23-2015         NaN        NaN        NaN  10-23-2015        110

            df2price2        date  df3price1  df3price2
10-20-2015        140  10-20-2015        100        150
10-21-2015        110         NaN        NaN        NaN
10-22-2015        NaN  10-22-2015         90        100
10-23-2015        120  10-23-2015         80        130

这篇关于合并具有非唯一索引的多个 pandas 数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆