pandas 将多个数据框与TimeStamp索引对齐 [英] Pandas aligning multiple dataframes with TimeStamp index
问题描述
在过去的几天里,这一直是我一生的祸根.我有许多Pandas数据框,其中包含不规则频率的时间序列数据.我尝试将这些对齐到单个数据帧中.
This has been the bane of my life for the past couple of days. I have numerous Pandas Dataframes that contain time series data with irregular frequencies. I try to align these into a single dataframe.
下面是一些代码,带有代表性的数据帧df1
,df2
和df3
(我实际上有n = 5,并且希望能对所有n>2
都适用的解决方案):
Below is some code, with representative dataframes, df1
, df2
, and df3
( I actually have n=5, and would appreciate a solution that would work for all n>2
):
# df1, df2, df3 are given at the bottom
import pandas as pd
import datetime
# I can align df1 to df2 easily
df1aligned, df2aligned = df1.align(df2)
# And then concatenate into a single dataframe
combined_1_n_2 = pd.concat([df1aligned, df2aligned], axis =1 )
# Since I don't know any better, I then try to align df3 to combined_1_n_2 manually:
combined_1_n_2.align(df3)
error: Reindexing only valid with uniquely valued Index objects
我知道为什么会出现此错误,因此我摆脱了combined_1_n_2
中的重复索引,然后重试:
I have an idea why I get this error, so I get rid of the duplicate indices in combined_1_n_2
and try again:
combined_1_n_2 = combined_1_n_2.groupby(combined_1_n_2.index).first()
combined_1_n_2.align(df3) # But stll get the same error
error: Reindexing only valid with uniquely valued Index objects
为什么会出现此错误?即使这样做有效,也完全是手动和丑陋的.如何对齐> 2个时间序列并将其合并到单个数据框中?
Why am I getting this error? Even if this worked, it is completely manual and ugly. How can I align >2 time series and combine them in a single dataframe?
数据:
df1 = pd.DataFrame( {'price' : [62.1250,62.2500,62.2375,61.9250,61.9125 ]},
index = [pd.DatetimeIndex([datetime.datetime.strptime(s, '%Y-%m-%d %H:%M:%S.%f')])[0]
for s in ['2008-06-01 06:03:59.614000', '2008-06-01 06:03:59.692000',
'2008-06-01 06:15:42.004000', '2008-06-01 06:15:42.083000','2008-06-01 06:17:01.654000' ] ])
df2 = pd.DataFrame({'price': [241.0625, 241.5000, 241.3750, 241.2500, 241.3750 ]},
index = [pd.DatetimeIndex([datetime.datetime.strptime(s, '%Y-%m-%d %H:%M:%S.%f')])[0]
for s in ['2008-06-01 06:13:34.524000', '2008-06-01 06:13:34.602000',
'2008-06-01 06:15:05.399000', '2008-06-01 06:15:05.399000','2008-06-01 06:15:42.082000' ] ])
df3 = pd.DataFrame({'price': [67.656, 67.875, 67.8125, 67.75, 67.6875 ]},
index = [pd.DatetimeIndex([datetime.datetime.strptime(s, '%Y-%m-%d %H:%M:%S.%f')])[0]
for s in ['2008-06-01 06:03:52.281000', '2008-06-01 06:03:52.359000',
'2008-06-01 06:13:34.848000', '2008-06-01 06:13:34.926000','2008-06-01 06:15:05.321000' ] ])
推荐答案
您的特定错误是由于combined_1_n_2
的列名具有重复项(两个列都将命名为"price").您可以重命名列,第二个对齐将起作用.
Your specific error is due the column names of combined_1_n_2
having duplicates (both columns will be named 'price'). You could rename the columns and the second align would work.
另一种方法是链接join
运算符,该运算符将合并索引上的帧,如下所示.
One alternative way would be to chain the join
operator, which merges frames on the index, as below.
In [23]: df1.join(df2, how='outer', rsuffix='_1').join(df3, how='outer', rsuffix='_2')
Out[23]:
price price_1 price_2
2008-06-01 06:03:52.281000 NaN NaN 67.6560
2008-06-01 06:03:52.359000 NaN NaN 67.8750
2008-06-01 06:03:59.614000 62.1250 NaN NaN
2008-06-01 06:03:59.692000 62.2500 NaN NaN
2008-06-01 06:13:34.524000 NaN 241.0625 NaN
2008-06-01 06:13:34.602000 NaN 241.5000 NaN
2008-06-01 06:13:34.848000 NaN NaN 67.8125
2008-06-01 06:13:34.926000 NaN NaN 67.7500
2008-06-01 06:15:05.321000 NaN NaN 67.6875
2008-06-01 06:15:05.399000 NaN 241.3750 NaN
2008-06-01 06:15:05.399000 NaN 241.2500 NaN
2008-06-01 06:15:42.004000 62.2375 NaN NaN
2008-06-01 06:15:42.082000 NaN 241.3750 NaN
2008-06-01 06:15:42.083000 61.9250 NaN NaN
2008-06-01 06:17:01.654000 61.9125 NaN NaN
这篇关于 pandas 将多个数据框与TimeStamp索引对齐的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!