根据时间范围差异合并 pandas 中的两个数据框 [英] Merging two Dataframes in Pandas based on time-range difference
问题描述
我有这两个数据帧df1
,df2
.
df1:
dateTime userId session
2018-08-30 02:20:19 2233 1
2018-08-30 05:32:10 1933 1
2018-08-30 09:10:39 2233 2
2018-08-30 10:26:59 2233 3
2018-08-30 11:56:25 4459 1
2018-08-30 12:30:55 4459 1
df2:
clickTime userId session clickId
2018-08-30 02:21:09 2233 1987
2018-08-30 02:23:19 2233 1988
2018-08-30 02:24:00 2233 1989
2018-08-30 02:32:09 2233 1990
2018-08-30 05:33:10 1933 2009
2018-08-30 05:35:19 1933 2010
2018-08-30 05:36:59 1933 2011
2018-08-30 11:57:25 4459 3012
2018-08-30 11:58:55 4459 3013
我想合并userId
上的两个数据帧以及时间范围列,例如10分钟. df1
.
I want to merge the two dataframes on userId
and also the time-range columns, which falls within say a range of 10 mins. of df1
.
所以我想要的数据框是这样的:
So my dataframe that I want is something like this:
dateTime userId session clickTime clickId
2018-08-30 02:20:19 2233 1 2018-08-30 02:21:09 1987
2018-08-30 02:20:19 2233 1 2018-08-30 02:23:19 1988
2018-08-30 02:20:19 2233 1 2018-08-30 02:21:09 1989
2018-08-30 02:20:19 2233 1 2018-08-30 02:21:09 1990
所以我想要每个用户使用它们,数据框应该看起来像这样,对于每个userId
我想要这个数据框.是否有可能?
So I want them for each user, dataframe should look like this, for each userId
I want this dataframe. Is it possible?
因此,就像我要合并userId
上的df1
和df2
以及df2
的clickTime
一样,其时间范围应为df1
的dateTime
列的10-15分钟>.
So its like I want to merge df1
and df2
on userId
as well as clickTime
of df2
should lie with a time range of say 10-15mins of dateTime
column of df1
.
推荐答案
IIUC:使用 pandas.merge_asof
pd.merge_asof(
df1, df2,
left_on='dateTime',
right_on='clickTime',
by='userId',
direction='nearest'
)
dateTime userId session clickTime clickId
0 2018-08-30 02:20:19 2233 1 2018-08-30 02:21:09 1987
1 2018-08-30 05:32:10 1933 1 2018-08-30 05:33:10 2009
2 2018-08-30 09:10:39 2233 2 2018-08-30 02:32:09 1990
3 2018-08-30 10:26:59 2233 3 2018-08-30 02:32:09 1990
4 2018-08-30 11:56:25 4459 1 2018-08-30 11:57:25 3012
5 2018-08-30 12:30:55 4459 1 2018-08-30 11:58:55 3013
您可以指定要看多远的公差
You can specify a tolerance on how far away to look
pd.merge_asof(
df1, df2,
left_on='dateTime',
right_on='clickTime',
by='userId',
direction='nearest',
tolerance=pd.Timedelta(15, unit='m')
)
dateTime userId session clickTime clickId
0 2018-08-30 02:20:19 2233 1 2018-08-30 02:21:09 1987.0
1 2018-08-30 05:32:10 1933 1 2018-08-30 05:33:10 2009.0
2 2018-08-30 09:10:39 2233 2 NaT NaN
3 2018-08-30 10:26:59 2233 3 NaT NaN
4 2018-08-30 11:56:25 4459 1 2018-08-30 11:57:25 3012.0
5 2018-08-30 12:30:55 4459 1 NaT NaN
这篇关于根据时间范围差异合并 pandas 中的两个数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!