在最接近的日期合并数据框 [英] Merge dataframe on closest date

查看:51
本文介绍了在最接近的日期合并数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于某些实验,我有一些数据,这些数据由受试者ID和日期索引.我想将数据合并在一起,但是受试者可能会在不同的日子进行实验.这是我的意思的一个例子.下面显示的是两个不同实验的结果

I have some data for some experiments indexed by a subject ID and a date. I'd like to join the data together, but the subjects may undergo experiments on different days. Here is an example of what I mean. Shown below are the results from two different experiments

SubjectID  Date        ScoreA
1          2016-09-20      10
1          2016-09-21      12
1          2016-12-01      11

SubjectID  Date        ScoreB
1          2016-09-20      1
1          2016-09-24      5
1          2016-11-28      3
1          2016-12-11      9

我想将行连接到最接近的可用日期.所以理想情况下,我想要的输出是

I would like to join the rows to the closest available date. So ideally, my desired output is

SubjectID   Date1         Date2        ScoreA ScoreB
1            2016-09-20    2016-09-20    10      1
1            2016-09-21    2016-09-24    12      5
1            2016-12-01    2016-11-28    11      3

请注意,最接近的日期"的绝对值最接近.我该如何实现这样的目标?

Note "closest date" is closest in absolute value. How can I achieve something like this?

推荐答案

我不知道是否可以使用默认的熊猫功能来完成所需的操作,但是使用自定义聚合功能可以很简单地做到这一点:

I don't know if there is a way to do what you want with default pandas functionality, but it's straightforward to do it with a custom aggregation function:

def pick_closest(g):
    closest_date_loc = (g.Date1 - g.Date2).abs().argmin()
    return g.loc[closest_date_loc, ['ScoreA','Date2','ScoreB']]

merged = df1.merge(df2, on='SubjectID', suffixes=['1', '2'])
df3  = merged.groupby(['SubjectID','Date1'], as_index=False).apply(pick_closest).reset_index()
df3

   SubjectID      Date1  ScoreA      Date2  ScoreB
0          1 2016-09-20      10 2016-09-20       1
1          1 2016-09-21      12 2016-09-20       1
2          1 2016-12-01      11 2016-11-28       3

在此代码段中,这两个帧最初在SubjectID上合并,从而生成Date1Date2的所有可能组合.然后pick_closest函数为每个SubjectID/Date1组选择在Date1Date2之间日期差最小的行.

In this code snippet, the two frames are initially merged on SubjectID, generating all possible combinations of Date1 and Date2. Then the pick_closest function selects the row with the smallest date difference between Date1 and Date2 for each SubjectID/Date1 group.

这篇关于在最接近的日期合并数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆