如何加入列值在特定范围内的两个数据框? [英] How to join two dataframes for which column values are within a certain range?
本文介绍了如何加入列值在特定范围内的两个数据框?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
给定两个数据帧 df_1
和 df_2
,如何加入它们使得日期时间列 df_1
位于 start
之间> 和 end
在数据帧 df_2
中:
打印df_1时间戳 A B0 2016-05-14 10:54:33 0.020228 0.0265721 2016-05-14 10:54:34 0.057780 0.1754992 2016-05-14 10:54:35 0.098808 0.6209863 2016-05-14 10:54:36 0.158789 1.0148194 2016-05-14 10:54:39 0.038129 2.384590打印 df_2开始结束事件0 2016-05-14 10:54:31 2016-05-14 10:54:33 E11 2016-05-14 10:54:34 2016-05-14 10:54:37 E22 2016-05-14 10:54:38 2016-05-14 10:54:42 E3
获取对应的event
,其中df1.timestamp
介于df_2.start
和df2.end
之间</p>
时间戳 A B 事件0 2016-05-14 10:54:33 0.020228 0.026572 E11 2016-05-14 10:54:34 0.057780 0.175499 E22 2016-05-14 10:54:35 0.098808 0.620986 E23 2016-05-14 10:54:36 0.158789 1.014819 E24 2016-05-14 10:54:39 0.038129 2.384590 E3
解决方案
一个简单的解决方案是从 start and end
设置 closed = both<创建
interval index
/code> 然后使用 get_loc
获取事件即(希望所有日期时间都在时间戳 dtype 中)
df_2.index = pd.IntervalIndex.from_arrays(df_2['start'],df_2['end'],closed='both')df_1['event'] = df_1['timestamp'].apply(lambda x : df_2.iloc[df_2.index.get_loc(x)]['event'])
输出:
<前>时间戳 A B 事件0 2016-05-14 10:54:33 0.020228 0.026572 E11 2016-05-14 10:54:34 0.057780 0.175499 E22 2016-05-14 10:54:35 0.098808 0.620986 E23 2016-05-14 10:54:36 0.158789 1.014819 E24 2016-05-14 10:54:39 0.038129 2.384590 E3Given two dataframes df_1
and df_2
, how to join them such that datetime column df_1
is in between start
and end
in dataframe df_2
:
print df_1
timestamp A B
0 2016-05-14 10:54:33 0.020228 0.026572
1 2016-05-14 10:54:34 0.057780 0.175499
2 2016-05-14 10:54:35 0.098808 0.620986
3 2016-05-14 10:54:36 0.158789 1.014819
4 2016-05-14 10:54:39 0.038129 2.384590
print df_2
start end event
0 2016-05-14 10:54:31 2016-05-14 10:54:33 E1
1 2016-05-14 10:54:34 2016-05-14 10:54:37 E2
2 2016-05-14 10:54:38 2016-05-14 10:54:42 E3
Get corresponding event
where df1.timestamp
is between df_2.start
and df2.end
timestamp A B event
0 2016-05-14 10:54:33 0.020228 0.026572 E1
1 2016-05-14 10:54:34 0.057780 0.175499 E2
2 2016-05-14 10:54:35 0.098808 0.620986 E2
3 2016-05-14 10:54:36 0.158789 1.014819 E2
4 2016-05-14 10:54:39 0.038129 2.384590 E3
解决方案
One simple solution is create interval index
from start and end
setting closed = both
then use get_loc
to get the event i.e (Hope all the date times are in timestamps dtype )
df_2.index = pd.IntervalIndex.from_arrays(df_2['start'],df_2['end'],closed='both')
df_1['event'] = df_1['timestamp'].apply(lambda x : df_2.iloc[df_2.index.get_loc(x)]['event'])
Output :
timestamp A B event 0 2016-05-14 10:54:33 0.020228 0.026572 E1 1 2016-05-14 10:54:34 0.057780 0.175499 E2 2 2016-05-14 10:54:35 0.098808 0.620986 E2 3 2016-05-14 10:54:36 0.158789 1.014819 E2 4 2016-05-14 10:54:39 0.038129 2.384590 E3
这篇关于如何加入列值在特定范围内的两个数据框?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文