在最接近的日期合并数据框 [英] Merge dataframe on closest date
问题描述
对于某些实验,我有一些数据,这些数据由受试者ID和日期索引.我想将数据合并在一起,但是受试者可能会在不同的日子进行实验.这是我的意思的一个例子.下面显示的是两个不同实验的结果
I have some data for some experiments indexed by a subject ID and a date. I'd like to join the data together, but the subjects may undergo experiments on different days. Here is an example of what I mean. Shown below are the results from two different experiments
SubjectID Date ScoreA
1 2016-09-20 10
1 2016-09-21 12
1 2016-12-01 11
SubjectID Date ScoreB
1 2016-09-20 1
1 2016-09-24 5
1 2016-11-28 3
1 2016-12-11 9
我想将行连接到最接近的可用日期.所以理想情况下,我想要的输出是
I would like to join the rows to the closest available date. So ideally, my desired output is
SubjectID Date1 Date2 ScoreA ScoreB
1 2016-09-20 2016-09-20 10 1
1 2016-09-21 2016-09-24 12 5
1 2016-12-01 2016-11-28 11 3
请注意,最接近的日期"的绝对值最接近.我该如何实现这样的目标?
Note "closest date" is closest in absolute value. How can I achieve something like this?
推荐答案
我不知道是否可以使用默认的熊猫功能来完成所需的操作,但是使用自定义聚合功能可以很简单地做到这一点:
I don't know if there is a way to do what you want with default pandas functionality, but it's straightforward to do it with a custom aggregation function:
def pick_closest(g):
closest_date_loc = (g.Date1 - g.Date2).abs().argmin()
return g.loc[closest_date_loc, ['ScoreA','Date2','ScoreB']]
merged = df1.merge(df2, on='SubjectID', suffixes=['1', '2'])
df3 = merged.groupby(['SubjectID','Date1'], as_index=False).apply(pick_closest).reset_index()
df3
SubjectID Date1 ScoreA Date2 ScoreB
0 1 2016-09-20 10 2016-09-20 1
1 1 2016-09-21 12 2016-09-20 1
2 1 2016-12-01 11 2016-11-28 3
在此代码段中,这两个帧最初在SubjectID
上合并,从而生成Date1
和Date2
的所有可能组合.然后pick_closest
函数为每个SubjectID
/Date1
组选择在Date1
和Date2
之间日期差最小的行.
In this code snippet, the two frames are initially merged on SubjectID
, generating all possible combinations of Date1
and Date2
. Then the pick_closest
function selects the row with the smallest date difference between Date1
and Date2
for each SubjectID
/Date1
group.
这篇关于在最接近的日期合并数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!