pandas 将多个数据框与TimeStamp索引对齐 [英] Pandas aligning multiple dataframes with TimeStamp index

查看:80
本文介绍了 pandas 将多个数据框与TimeStamp索引对齐的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在过去的几天里,这一直是我一生的祸根.我有许多Pandas数据框,其中包含不规则频率的时间序列数据.我尝试将这些对齐到单个数据帧中.

This has been the bane of my life for the past couple of days. I have numerous Pandas Dataframes that contain time series data with irregular frequencies. I try to align these into a single dataframe.

下面是一些代码,带有代表性的数据帧df1df2df3(我实际上有n = 5,并且希望能对所有n>2都适用的解决方案):

Below is some code, with representative dataframes, df1, df2, and df3 ( I actually have n=5, and would appreciate a solution that would work for all n>2):

# df1, df2, df3 are given at the bottom
import pandas as pd
import datetime

# I can align df1 to df2 easily
df1aligned, df2aligned = df1.align(df2)
# And then concatenate into a single dataframe
combined_1_n_2 = pd.concat([df1aligned, df2aligned], axis =1 )
# Since I don't know any better, I then try to align df3 to combined_1_n_2  manually:
combined_1_n_2.align(df3)
error: Reindexing only valid with uniquely valued Index objects

我知道为什么会出现此错误,因此我摆脱了combined_1_n_2中的重复索引,然后重试:

I have an idea why I get this error, so I get rid of the duplicate indices in combined_1_n_2 and try again:

combined_1_n_2 = combined_1_n_2.groupby(combined_1_n_2.index).first()
combined_1_n_2.align(df3) # But stll get the same error
error: Reindexing only valid with uniquely valued Index objects

为什么会出现此错误?即使这样做有效,也完全是手动和丑陋的.如何对齐> 2个时间序列并将其合并到单个数据框中?

Why am I getting this error? Even if this worked, it is completely manual and ugly. How can I align >2 time series and combine them in a single dataframe?

数据:

df1 = pd.DataFrame( {'price' : [62.1250,62.2500,62.2375,61.9250,61.9125 ]}, 
                     index = [pd.DatetimeIndex([datetime.datetime.strptime(s, '%Y-%m-%d %H:%M:%S.%f')])[0] 
                     for s in ['2008-06-01 06:03:59.614000', '2008-06-01 06:03:59.692000', 
                     '2008-06-01 06:15:42.004000', '2008-06-01 06:15:42.083000','2008-06-01 06:17:01.654000' ] ])   

df2 = pd.DataFrame({'price': [241.0625, 241.5000, 241.3750, 241.2500, 241.3750 ]},
                    index = [pd.DatetimeIndex([datetime.datetime.strptime(s, '%Y-%m-%d %H:%M:%S.%f')])[0] 
                     for s in ['2008-06-01 06:13:34.524000', '2008-06-01 06:13:34.602000', 
                     '2008-06-01 06:15:05.399000', '2008-06-01 06:15:05.399000','2008-06-01 06:15:42.082000' ] ])   

df3 = pd.DataFrame({'price': [67.656, 67.875, 67.8125, 67.75, 67.6875 ]},
                    index = [pd.DatetimeIndex([datetime.datetime.strptime(s, '%Y-%m-%d %H:%M:%S.%f')])[0] 
                     for s in ['2008-06-01 06:03:52.281000', '2008-06-01 06:03:52.359000', 
                     '2008-06-01 06:13:34.848000', '2008-06-01 06:13:34.926000','2008-06-01 06:15:05.321000' ] ])   

推荐答案

您的特定错误是由于combined_1_n_2的列名具有重复项(两个列都将命名为"price").您可以重命名列,第二个对齐将起作用.

Your specific error is due the column names of combined_1_n_2 having duplicates (both columns will be named 'price'). You could rename the columns and the second align would work.

另一种方法是链接join运算符,该运算符将合并索引上的帧,如下所示.

One alternative way would be to chain the join operator, which merges frames on the index, as below.

In [23]: df1.join(df2, how='outer', rsuffix='_1').join(df3, how='outer', rsuffix='_2')
Out[23]: 
                              price   price_1  price_2
2008-06-01 06:03:52.281000      NaN       NaN  67.6560
2008-06-01 06:03:52.359000      NaN       NaN  67.8750
2008-06-01 06:03:59.614000  62.1250       NaN      NaN
2008-06-01 06:03:59.692000  62.2500       NaN      NaN
2008-06-01 06:13:34.524000      NaN  241.0625      NaN
2008-06-01 06:13:34.602000      NaN  241.5000      NaN
2008-06-01 06:13:34.848000      NaN       NaN  67.8125
2008-06-01 06:13:34.926000      NaN       NaN  67.7500
2008-06-01 06:15:05.321000      NaN       NaN  67.6875
2008-06-01 06:15:05.399000      NaN  241.3750      NaN
2008-06-01 06:15:05.399000      NaN  241.2500      NaN
2008-06-01 06:15:42.004000  62.2375       NaN      NaN
2008-06-01 06:15:42.082000      NaN  241.3750      NaN
2008-06-01 06:15:42.083000  61.9250       NaN      NaN
2008-06-01 06:17:01.654000  61.9125       NaN      NaN

这篇关于 pandas 将多个数据框与TimeStamp索引对齐的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆