在最匹配的日期时间索引上合并两个数据框 [英] Merge two dataframes on closest matching datetime index

查看:92
本文介绍了在最匹配的日期时间索引上合并两个数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个数据帧,它们的DateTime索引最匹配,有时匹配.目的是将两个索引合并为一个索引,然后将第二个附加到最接近的匹配项(在1分钟以内).

I have two data frames with closest matching DateTime index, sometimes matching. The object is to merge two of them using one index as a reference and appending the second to the closest matching (within 1 minute) on the first one.

我的代码和输出:

import pandas as pd

masterdf = pd.DataFrame({"AA":[77.368607,77.491655,77.425134,76.490991]})
masterdf.index = ['2019-10-01 07:52:07','2019-10-01 07:53:01','2019-10-01 07:53:54','2019-10-01 07:54:47']
masterdf.index.name = 'datetime'

slavedf = pd.DataFrame({"BB":[50,60,70,80]})
slavedf.index = ['2019-10-01 07:53:00','2019-10-01 07:53:54','2019-10-01 10:54:47','2019-10-01 10:00:00']
slavedf.index.name = 'datetime'

maindf = masterdf.merge(slavedf,left_index=True,right_index=True)

当前输出:

masterdf = 
                            AA
datetime                      
2019-10-01 07:52:07  77.368607
2019-10-01 07:53:01  77.491655
2019-10-01 07:53:54  77.425134
2019-10-01 07:54:47  76.490991

slavedf = 
                     BB
datetime               
2019-10-01 07:53:00  50
2019-10-01 07:53:54  60
2019-10-01 10:54:47  70
2019-10-01 10:00:00  80

maindf = 
datetime                   AA         BB
2019-10-01 07:53:54    77.425134      60

预期输出:

maindf = 
datetime                   AA          BB
2019-10-01 07:53:01    77.491655       50
2019-10-01 07:53:54    77.425134       60

我该如何实现?

推荐答案

此处的逻辑使用 merge_asof ,由于我们需要对其进行调整,因此 merge_asof 将使用第二个dataframe多次,那么我们需要额外的键,这里是datetime来删除重复项

Logic here use the merge_asof , we need to adjust it due to , merge_asof will use the 2nd dataframe mutiple times , then we need additional key here is datetime to drop the duplicate

masterdf.index=pd.to_datetime(masterdf.index)
masterdf=masterdf.sort_index().reset_index()
slavedf.index=pd.to_datetime(slavedf.index)
slavedf=slavedf.sort_index().reset_index()
slavedf['datetime2']=slavedf['datetime']
slavedf['key']=slavedf.index
newdf=pd.merge_asof(masterdf,slavedf,on='datetime',tolerance=pd.Timedelta('60s'),direction='nearest')
newdf['diff']=(newdf.datetime-newdf.datetime2).abs()
newdf=newdf.sort_values('diff').drop_duplicates('key')
newdf
Out[35]: 
             datetime         AA  BB           datetime2     diff
2 2019-10-01 07:53:54  77.425134  60 2019-10-01 07:53:54 00:00:00
1 2019-10-01 07:53:01  77.491655  50 2019-10-01 07:53:00 00:00:01

这篇关于在最匹配的日期时间索引上合并两个数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆