合并具有非唯一索引的多个 pandas 数据集 [英] Merging multiple pandas datasets with non-unique index
问题描述
我在字典中存储了几个结构类似的熊猫数据框.我通过以下方式访问数据框.
I have several similarly structured pandas dataframes stored in a dictionary. I access a dataframe in the following way.
ex_dict[df1]
date df1price1 df1price2
10-20-2015 100 150
10-21-2015 90 100
我想按日期将所有这些数据框合并为一个数据框.日期是重叠的,但并非所有数据框都包含所有日期.
I want to merge all of these dataframes into one dataframe by date. The dates are overlapping, but not all dataframes include all dates.
我需要从这里出发
df1
date df1price1 df1price2
10-20-2015 100 150
10-21-2015 90 100
10-22-2015 100 140
df2
date df2price1 df2price2
10-20-2015 110 140
10-21-2015 90 110
10-23-2015 110 120
df3
date df3price1 df3price2
10-20-2015 100 150
10-22-2015 90 100
10-23-2015 80 130
对此:
df_all
date df1price1 df1price2 ... df3price1 df3price2
10-20-2015 100 150 ... 100 150
10-21-2015 90 100 ... NaN NaN
10-22-2015 100 140 ... 90 100
10-23-2015 NaN NaN ... 80 130
我已经尝试了很多方法,但是我无法使其工作,因为一次不能重复合并2以创建一个新的数据框,然后再合并到该数据框上.我需要合并的数据帧数量在4到10之间变化,因此我需要一种自动执行此操作的方法(因此,我认为通过dict可能会起作用).
I've tried lots of things, but I can't get it to work, short of repeatedly merging 2 at a time to create a new dataframe and then remerging onto that. The number of dataframes I need to merge varies between 4 and 10, so I need a way to do this automatically (hence why I thought a passing a dict might work).
对此,任何帮助将不胜感激.
Any help on this would be incredibly appreciated.
推荐答案
您可以使用concat
后跟groupby('date')
来平坦化结果.
You can use a concat
followed by a groupby('date')
to flatten the result.
In [22]: pd.concat([df1,df2,df3]).groupby('date').max()
Out[22]:
df1price1 df1price2 df2price1 df2price2 df3price1 df3price2
date
10-20-2015 100 150 110 140 100 150
10-21-2015 90 100 90 110 NaN NaN
10-22-2015 100 140 NaN NaN 90 100
10-23-2015 NaN NaN 110 120 80 130
正如BrenBarn在注释中指出的那样,如果将连接列设置为数据帧的索引,则可以使用concat(axis=1)
:
As BrenBarn points out in the comments, you can use concat(axis=1)
if you set the join column as the index of your dataframes:
df1.index = df1.date
df2.index = df2.date
df3.index = df3.date
In [44]: pd.concat([df1,df2,df3],axis=1)
Out[44]:
date df1price1 df1price2 date df2price1 \
10-20-2015 10-20-2015 100 150 10-20-2015 110
10-21-2015 10-21-2015 90 100 10-21-2015 90
10-22-2015 10-22-2015 100 140 NaN NaN
10-23-2015 NaN NaN NaN 10-23-2015 110
df2price2 date df3price1 df3price2
10-20-2015 140 10-20-2015 100 150
10-21-2015 110 NaN NaN NaN
10-22-2015 NaN 10-22-2015 90 100
10-23-2015 120 10-23-2015 80 130
这篇关于合并具有非唯一索引的多个 pandas 数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!