SAS:模糊连接 [英] SAS: Fuzzy Joins

查看:18
本文介绍了SAS:模糊连接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在SAS中运行以下SQL查询:

proc sql;
create table my_table as
select a.*, b.* 
from table_a a
inner join table_b b
on (a.date_1 between b.date_2 and b.date_3 and a.id1 = b.id1)
or a.id2 = b.id2;
quit;

我的问题:我正在尝试将";a.id1=b.id1";替换为";a.id1模糊相等a.id1";(https://www.ibm.com/docs/en/psfa/7.2.1?topic=functions-fuzzy-string-search),并使此";显式传递(https://www.lexjansen.com/mwsug/2013/RF/MWSUG-2013-RF02.pdf):

 proc sql;
connect to netezza(server = &abc database = &abc user =&abc password = &abc  bunkload = yes);
    create table my_table as
    select a.*, b.* 
    from table_a a
    inner join table_b b
    on (a.date_1 between b.date_2 and b.date_3 where le_dst(a.id1, b.id1) = 1 )
    or a.id2 = b.id2;
    quit;

但我是SAS新手,不知道如何正确操作(表格在Netezza上)。

有人能教我怎么做吗?是否有其他常见的模糊联接函数非常适合于此类问题?

谢谢!

注意1:表格如下:

> table_a

    id1 id2     date_1
1 123 A  11 2010-01-31
2 123BB  12 2010-01-31
3  12 5  14 2015-01-31
4 12--5  13 2018-01-31

> table_b

     id1 id2     date_2     date_3
1   0123 111 2009-01-31 2011-01-31
2   1233 112 2010-01-31 2010-01-31
3 125  .  14 2010-01-31 2020-01-31
4   125_ 113 2010-01-31 2020-01-31

注意2:用于为本例创建这些表的R代码(在我的原始问题中,日期出现在R中的";因数和变量类型中):

table_a = data.frame(id1 = c("123 A", "123BB", "12 5", "12--5"), id2 = c("11", "12", "14", "13"),
date_1 = c("2010-01-31","2010-01-31", "2015-01-31", "2018-01-31" ))

table_a$id1 = as.factor(table_a$id1)
table_a$id2 = as.factor(table_a$id2)
table_a$date_1 = as.factor(table_a$date_1)

table_b = data.frame(id1 = c("0123", "1233", "125  .", "125_"), id2 = c("111", "112", "14", "113"),
date_2 = c("2009-01-31","2010-01-31", "2010-01-31", "2010-01-31" ),
date_3 = c("2011-01-31","2010-01-31", "2020-01-31", "2020-01-31" ))


table_b$id1 = as.factor(table_b$id1)
table_b$id2 = as.factor(table_b$id2)
table_b$date_2 = as.factor(table_b$date_2)
table_b$date_3 = as.factor(table_b$date_3)

推荐答案

将Sql推送到远程数据库。

proc sql;
  connect to netezza .... ;
  create table sastable as 
    select * from connection to netezza
      (
select a.*, b.* 
  from table_a a
  inner join table_b b
    on (a.date_1 between b.date_2 and b.date_3)
      and (le_dst(a.id1, b.id1) = 1 or a.id2 = b.id2)
     )
  ;
quit;

这篇关于SAS:模糊连接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆