根据时间范围差异合并 pandas 中的两个数据框 [英] Merging two Dataframes in Pandas based on time-range difference

查看:109
本文介绍了根据时间范围差异合并 pandas 中的两个数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这两个数据帧df1df2.

df1:

dateTime                 userId  session

2018-08-30 02:20:19      2233      1
2018-08-30 05:32:10      1933      1
2018-08-30 09:10:39      2233      2
2018-08-30 10:26:59      2233      3
2018-08-30 11:56:25      4459      1
2018-08-30 12:30:55      4459      1

df2:

clickTime                 userId  session  clickId

2018-08-30 02:21:09      2233               1987
2018-08-30 02:23:19      2233               1988
2018-08-30 02:24:00      2233               1989
2018-08-30 02:32:09      2233               1990
2018-08-30 05:33:10      1933               2009
2018-08-30 05:35:19      1933               2010
2018-08-30 05:36:59      1933               2011
2018-08-30 11:57:25      4459               3012
2018-08-30 11:58:55      4459               3013

我想合并userId上的两个数据帧以及时间范围列,例如10分钟. df1.

I want to merge the two dataframes on userId and also the time-range columns, which falls within say a range of 10 mins. of df1.

所以我想要的数据框是这样的:

So my dataframe that I want is something like this:

  dateTime               userId  session   clickTime             clickId

2018-08-30 02:20:19      2233      1       2018-08-30 02:21:09    1987
2018-08-30 02:20:19      2233      1       2018-08-30 02:23:19    1988
2018-08-30 02:20:19      2233      1       2018-08-30 02:21:09    1989
2018-08-30 02:20:19      2233      1       2018-08-30 02:21:09    1990

所以我想要每个用户使用它们,数据框应该看起来像这样,对于每个userId我想要这个数据框.是否有可能?

So I want them for each user, dataframe should look like this, for each userId I want this dataframe. Is it possible?

因此,就像我要合并userId上的df1df2以及df2clickTime一样,其时间范围应为df1dateTime列的10-15分钟>.

So its like I want to merge df1 and df2 on userId as well as clickTime of df2 should lie with a time range of say 10-15mins of dateTime column of df1.

推荐答案

IIUC:使用 pandas.merge_asof

pd.merge_asof(
    df1, df2,
    left_on='dateTime',
    right_on='clickTime',
    by='userId',
    direction='nearest'
)

             dateTime  userId  session           clickTime  clickId
0 2018-08-30 02:20:19    2233        1 2018-08-30 02:21:09     1987
1 2018-08-30 05:32:10    1933        1 2018-08-30 05:33:10     2009
2 2018-08-30 09:10:39    2233        2 2018-08-30 02:32:09     1990
3 2018-08-30 10:26:59    2233        3 2018-08-30 02:32:09     1990
4 2018-08-30 11:56:25    4459        1 2018-08-30 11:57:25     3012
5 2018-08-30 12:30:55    4459        1 2018-08-30 11:58:55     3013


您可以指定要看多远的公差


You can specify a tolerance on how far away to look

pd.merge_asof(
    df1, df2,
    left_on='dateTime',
    right_on='clickTime',
    by='userId',
    direction='nearest',
    tolerance=pd.Timedelta(15, unit='m')
)

             dateTime  userId  session           clickTime  clickId
0 2018-08-30 02:20:19    2233        1 2018-08-30 02:21:09   1987.0
1 2018-08-30 05:32:10    1933        1 2018-08-30 05:33:10   2009.0
2 2018-08-30 09:10:39    2233        2                 NaT      NaN
3 2018-08-30 10:26:59    2233        3                 NaT      NaN
4 2018-08-30 11:56:25    4459        1 2018-08-30 11:57:25   3012.0
5 2018-08-30 12:30:55    4459        1                 NaT      NaN

这篇关于根据时间范围差异合并 pandas 中的两个数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆