根据间隔加入/合并数据框 [英] Join/Merge Dataframe based on interval

查看:95
本文介绍了根据间隔加入/合并数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个表:一个是计划的漏洞补丁表,另一个是实现漏洞补丁的表.如下所示:

I have two tables: one is a table of planned leak patch, and the other is a table of realization of leak patched. Like follows:

第二个表具有更多列(此处省略).这里的情况是我可以使用SQL表达式按如下方式联接这两个表:

The second table has more columns (omitted here). The case here is that i can join these two tables using SQL expression as follows:

SELECT
    PLAN.PIPE_ID,
    PLAN.INTERVAL_START,
    PLAN.INTERVAL_END,
    REAL.REAL_START,
    REAL.REAL_END,
    ...
FROM PLAN, REAL
WHERE
    PLAN.PIPE_ID=REAL.PIPE_ID
    REAL_START<INTERVAL_END AND
    REAL_END>INTERVAL_START AND
    ((REAL_START>=INTERVAL_START AND REAL_END<=INTERVAL_END) OR --CASE 1
    (REAL_START>=INTERVAL_START AND REAL_START>INTERVAL_END) OR --CASE 2
    (REAL_START<INTERVAL_START AND REAL_START<=INTERVAL_END) OR --CASE 3
    (REAL_START<INTERVAL_START AND REAL_START>INTERVAL_END)) --CASE 4

这种情况的情况是为了满足下面的描述,并确保仅在两个表之间的间隔相交时才显示数据.

whence the cases of the condition are to suffice the description below and to make sure that the data displayed are only if the interval between two tables have intersection.

如果我要使用Pandas进行相同的加入.怎么做?我是熊猫的新手,我了解到熊猫加入使用密钥.

If i were to make the same join using Pandas. How to do it? I am new to pandas, and i learned that Pandas join uses key(s).

非常感谢

推荐答案

在六种情况下,范围可以相交或不相交.您已经记录了四种情况,其他两种情况是REAL完全在PLAN之前的一种情况.可以通过说明REAL.REAL_END< PLAN.INTERVAL_START,另一个条件是在PLAN之后完成REAL的情况.可以通过指定REAL.REAL_START> PLAN.INTERVAL_END的时间来测试这种情况.

There are six cases which range can intersect or not. You have documented four cases, other two cases are one where REAL is completely before PLAN. This condition can be tested by stating where REAL.REAL_END < PLAN.INTERVAL_START, and the other condition is where REAL is complete after PLAN. This condition can be tested by stating when REAL.REAL_START > PLAN.INTERVAL_END.

因此,您可以简单地通过SQL查询来检查这两个条件(如果两个条件都不成立),则您具有相交的范围.在熊猫中,您可以使用与SQL中非常相似的以下内容,进行交叉联接并过滤结果:

So, you can simply your SQL query to check this two conditions if neither is true then you have intersecting ranges. In pandas, you can use the following, very similiar to what you are are doing in SQL, do a cross join and filter the results:

import pandas as pd
import numpy as np

plan = pd.DataFrame({'PIPE_ID':['P_01','P_01'],'INTERVAL_START':[150,175],'INTERVAL_END':[151.5,177.5]})

real  = pd.DataFrame({'PIPE_ID':['P_01','P_01'],'REAL_START':[148,174.5],'REAL_END':[150.5,178]})

df_merged = plan.merge(real, on='PIPE_ID')

df_out = df_merged[~((df_merged['REAL_END'] < df_merged['INTERVAL_START']) | (df_merged['REAL_START']>df_merged['INTERVAL_END']))] 

print(df_out)

输出:

   INTERVAL_END  INTERVAL_START PIPE_ID  REAL_END  REAL_START
0         151.5             150    P_01     150.5       148.0
3         177.5             175    P_01     178.0       174.5

这篇关于根据间隔加入/合并数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆