如何加入列值在特定范围内的两个数据框? [英] How to join two dataframes for which column values are within a certain range?

查看:26
本文介绍了如何加入列值在特定范围内的两个数据框?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定两个数据帧 df_1df_2,如何加入它们使得日期时间列 df_1 位于 start 之间> 和 end 在数据帧 df_2 中:

打印df_1时间戳 A B0 2016-05-14 10:54:33 0.020228 0.0265721 2016-05-14 10:54:34 0.057780 0.1754992 2016-05-14 10:54:35 0.098808 0.6209863 2016-05-14 10:54:36 0.158789 1.0148194 2016-05-14 10:54:39 0.038129 2.384590打印 df_2开始结束事件0 2016-05-14 10:54:31 2016-05-14 10:54:33 E11 2016-05-14 10:54:34 2016-05-14 10:54:37 E22 2016-05-14 10:54:38 2016-05-14 10:54:42 E3

获取对应的event,其中df1.timestamp介于df_2.startdf2.end之间<​​/p>

 时间戳 A B 事件0 2016-05-14 10:54:33 0.020228 0.026572 E11 2016-05-14 10:54:34 0.057780 0.175499 E22 2016-05-14 10:54:35 0.098808 0.620986 E23 2016-05-14 10:54:36 0.158789 1.014819 E24 2016-05-14 10:54:39 0.038129 2.384590 E3

解决方案

一个简单的解决方案是从 start and end 设置 closed = both<创建 interval index/code> 然后使用 get_loc 获取事件即(希望所有日期时间都在时间戳 dtype 中)

df_2.index = pd.IntervalIndex.from_arrays(df_2['start'],df_2['end'],closed='both')df_1['event'] = df_1['timestamp'].apply(lambda x : df_2.iloc[df_2.index.get_loc(x)]['event'])

输出:

<前>时间戳 A B 事件0 2016-05-14 10:54:33 0.020228 0.026572 E11 2016-05-14 10:54:34 0.057780 0.175499 E22 2016-05-14 10:54:35 0.098808 0.620986 E23 2016-05-14 10:54:36 0.158789 1.014819 E24 2016-05-14 10:54:39 0.038129 2.384590 E3

Given two dataframes df_1 and df_2, how to join them such that datetime column df_1 is in between start and end in dataframe df_2:

print df_1

  timestamp              A          B
0 2016-05-14 10:54:33    0.020228   0.026572
1 2016-05-14 10:54:34    0.057780   0.175499
2 2016-05-14 10:54:35    0.098808   0.620986
3 2016-05-14 10:54:36    0.158789   1.014819
4 2016-05-14 10:54:39    0.038129   2.384590


print df_2

  start                end                  event    
0 2016-05-14 10:54:31  2016-05-14 10:54:33  E1
1 2016-05-14 10:54:34  2016-05-14 10:54:37  E2
2 2016-05-14 10:54:38  2016-05-14 10:54:42  E3

Get corresponding event where df1.timestamp is between df_2.start and df2.end

  timestamp              A          B          event
0 2016-05-14 10:54:33    0.020228   0.026572   E1
1 2016-05-14 10:54:34    0.057780   0.175499   E2
2 2016-05-14 10:54:35    0.098808   0.620986   E2
3 2016-05-14 10:54:36    0.158789   1.014819   E2
4 2016-05-14 10:54:39    0.038129   2.384590   E3

解决方案

One simple solution is create interval index from start and end setting closed = both then use get_loc to get the event i.e (Hope all the date times are in timestamps dtype )

df_2.index = pd.IntervalIndex.from_arrays(df_2['start'],df_2['end'],closed='both')
df_1['event'] = df_1['timestamp'].apply(lambda x : df_2.iloc[df_2.index.get_loc(x)]['event'])

Output :

            timestamp         A         B event
0 2016-05-14 10:54:33  0.020228  0.026572    E1
1 2016-05-14 10:54:34  0.057780  0.175499    E2
2 2016-05-14 10:54:35  0.098808  0.620986    E2
3 2016-05-14 10:54:36  0.158789  1.014819    E2
4 2016-05-14 10:54:39  0.038129  2.384590    E3

这篇关于如何加入列值在特定范围内的两个数据框?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆