在Python Pandas Dataframe中转换间隔外部联接SQL [英] Convert Interval Outer Join SQL in Python Pandas Dataframe

查看:189
本文介绍了在Python Pandas Dataframe中转换间隔外部联接SQL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在Pandas Dataframe中转换Oracle SQL外间隔联接.下面是Oracle SQL:

I'm converting an Oracle SQL outer interval join in Pandas Dataframe. Below is the Oracle SQL:

WITH df_interval AS
          (SELECT '1' id,
                     'AAA' interval,
                     1000 begin,
                     2000 end
              FROM DUAL
            UNION ALL
            SELECT '1' id,
                     'BBB' intrvl,
                     2100 begin,
                     3000 end
              FROM DUAL
            UNION ALL
            SELECT '2' id,
                     'CCC' intrvl,
                     3100 begin,
                     4000 end
              FROM DUAL
            UNION ALL
            SELECT '2' id,
                     'DDD' intrvl,
                     4100 begin,
                     5000 end
              FROM DUAL),
      df_point AS
          (SELECT '1' id, 'X1' point, 1100 mid FROM DUAL
            UNION ALL
            SELECT '1' id, 'X2' point, 2050 mid FROM DUAL
            UNION ALL
            SELECT '1' id, 'X3' point, 3200 mid FROM DUAL
            UNION ALL
            SELECT '2' id, 'X4' point, 4200 mid FROM DUAL
            UNION ALL
            SELECT '2' id, 'X5' point, 5500 mid FROM DUAL)
SELECT pt.id,
         point,
         mid,
         interval
  FROM df_interval it RIGHT OUTER JOIN df_point pt ON pt.id = it.id AND pt.mid BETWEEN it.begin AND it.end

我试图创建数据框,但无法像上面的Oracle SQL一样以"RIGHT OUTER JOIN interval"的身份加入:

I tried to create dataframes, but I'm not able to join as 'RIGHT OUTER JOIN interval' as above Oracle SQL:

import pandas as pd
df_interval = pd.DataFrame({
                   'ID':['1','1','2','2'],
                   'interval': ['AAA', 'BBB', 'CCC', 'DDD'],
                   'begin': [1000,2100,3100,4100],
                   'end': [2000, 3000,4000,5000]})

df_point = pd.DataFrame({
                   'ID':['1','1','1','2','2'],
                   'point': ['X1', 'X2', 'X3', 'X4','X5'],
                   'mid': [1100,2050,3200,4200,5500]})

我希望输出将是这样的:

I expect the output would be something like this:

df_out = pd.DataFrame({
                   'ID':['1','1','1','2','2'],
                   'mid': [1100,2050,3200,4200,5500],
                   'intrvl':['AAA','','','DDD','']})

赞赏有人可以帮助我吗?

Appreciate anyone can help me on this?

推荐答案

我觉得merge_asof非常适合您,在结束和开始合并结果相同的情况下,我们只需要做两次就可以了,该间隔应为匹配的

I feel like merge_asof is perfect fine for you case, only different is we need do two times , when both the end and begin merge result is same , that interval should be the matched one

s1=pd.merge_asof(df_point,df_interval,by='ID',left_on='mid',right_on='end',direction='forward')
s2=pd.merge_asof(df_point,df_interval,by='ID',left_on='mid',right_on='begin',direction='backward')
s1.interval=s1.interval.where(s1.interval==s2.interval)
s1.drop(['end','begin'],1,inplace=True)
s1
  ID point   mid interval
0  1    X1  1100      AAA
1  1    X2  2050      NaN
2  1    X3  3200      NaN
3  2    X4  4200      DDD
4  2    X5  5500      NaN

这篇关于在Python Pandas Dataframe中转换间隔外部联接SQL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆