如何复制 SAS 合并 [英] How to replicate a SAS merge

查看:19
本文介绍了如何复制 SAS 合并的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两张桌子,t1 和 t2:

I have two tables, t1 and t2:

t1
  person | visit | code1 | type1
       1       1      50      50 
       1       1      50      50 
       1       2      75      50 

t2
  person | visit | code2 | type2
       1       1      50      50 
       1       1      50      50 
       1       1      50      50 

当 SAS 运行以下代码时:

When SAS runs the following code:

   DATA t3;
     MERGE t1 t2;
     BY person visit;

   RUN;

它生成以下数据集:

       person | visit | code1 | type1 | code2 | type2
            1       1      50      50      50      50
            1       1      50      50      50      50
            1       1      50      50      50      50
            1       2      75      50

我想在 SQL 中复制这个过程,我的想法是使用全外连接.除非有重复的行,否则此方法有效.当我们在上面的例子中有重复的行时,完整的外连接会生成下表:

I want to replicate this process in SQL, and my idea was to use a full-outer-join. This works unless there are duplicate rows. When we have duplicate rows like in the above example, a full outer join produces the following table:

       person | visit | code1 | type1 | code2 | type2
            1       1      50      50      50      50
            1       1      50      50      50      50
            1       1      50      50      50      50
            1       1      50      50      50      50
            1       1      50      50      50      50
            1       1      50      50      50      50
            1       2      75      50

我想知道如何让 SQl 表与 SAS 表匹配.

I'm wondering how I would get the SQl table to match the SAS table.

推荐答案

Gordon 的回答很接近;但它错过了一点.这是它的输出:

Gordon's answer is close; but it misses one point. Here's its output:

person  visit   code1   type1   seqnum  person  visit   code2   type2   seqnum
1       1       1       1       1       1       1       1       1       1
1       1       2       2       2       1       1       2       2       2
NULL    NULL    NULL    NULL    NULL    1       1       3       3       3
1       2       1       3       1       NULL    NULL    NULL    NULL    NULL

第三行的空值不正确,而第四行的空值是正确的.

The third row's nulls are incorrect, while the fourth's are correct.

据我所知,在 SQL 中,除了将事情拆分为几个查询之外,没有真正好的方法可以做到这一点.我认为有五种可能:

As far as I know, in SQL there's not a really good way to do this other than splitting things up into a few queries. I think there are five possibilities:

  • 匹配人/访问,匹配seqnums
  • 匹配人员/访问,Left 有更多 seqnums
  • 匹配人员/访问,Right 有更多 seqnums
  • 左边有不匹配的人/访问
  • 权利有无与伦比的人/访问

我认为最后两个可能适用于一个查询,但我认为第二个和第三个必须是单独的查询.当然,您可以将所有内容结合在一起.

I think the last two might be workable into one query, but I think the second and third have to be separate queries. You can union everything together, of course.

这里有一个例子,使用一些更适合查看正在发生的事情的临时表.请注意,现在为 code1type1 填写第三行,即使它们是额外的".我只添加了五个标准中的三个 - 您在最初示例中的三个 - 但其他两个并不太难.

So here's an example, using some temporary tables that are a little more well suited to see what's going on. Note that the third row is now filled in for code1 and type1, even though those are 'extra'. I've only added three of the five criteria - the three you had in your initial example - but the other two aren't too hard.

请注意,这是在 SAS 中更快的示例 - 因为 SAS 具有逐行概念,即它能够一次执行一行.对于大型表,SQL 在这些方面往往需要更长的时间,除非可以非常整齐地对事物进行分区并拥有非常好的索引 - 即便如此,我也从未见过 SQL DBA 在其中一些方面做得与 SAS 一样好类型的问题.当然,这是你必须接受的——SQL 有它自己的优势,其中之一可能就是价格......

Note that this is an example of something far faster in SAS - because SAS has a row-wise concept, ie, it's capable of going one row at a time. SQL tends to take a lot longer at these, with large tables, unless it's possible to partition things very neatly and have very good indexes - and even then I've never seen a SQL DBA do anywhere near as well as SAS at some of these types of problems. That's something you'll have to accept of course - SQL has its own advantages, one of which being probably price...

这是我的示例代码.我确信它不是非常优雅,希望 SQL 人员之一可以改进它.这是为在 SQL Server 中工作而编写的(使用表变量),同样的事情应该在其他变体中进行一些更改(使用临时表),假设它们实现了窗口化.(SAS 当然不能做这个特殊的事情——因为即使是 FedSQL 也实现了 ANSI 1999,而不是 ANSI 2008.)这是基于 Gordon 的初始查询,然后在最后用附加位进行修改.任何想要改进这一点的人,请随时编辑和/或复制到新的/现有的答案.

Here's my example code. I'm sure it's not terribly elegant, hopefully one of the SQL folk can improve it. This is written to work in SQL Server (using table variables), same thing should work with some changes (to use temporary tables) in other variants, assuming they implement windowing. (SAS of course can't do this particular thing - as even FedSQL implements ANSI 1999, not ANSI 2008.) This is based on Gordon's initial query, then modified with the additional bits at the end. Anyone who wants to improve this please feel free to edit and/or copy to a new/existing answer any bit you wish.

declare @t1 table (person INT, visit INT, code1 INT, type1 INT);
declare @t2 table (person INT, visit INT, code2 INT, type2 INT);


insert into @t1 values (1,1,1,1)
insert into @t1 values (1,1,2,2)
insert into @t1 values (1,2,1,3)

insert into @t2 values (1,1,1,1)
insert into @t2 values (1,1,2,2)
insert into @t2 values (1,1,3,3)

select coalesce(t1.person, t2.person) as person, coalesce(t1.visit, t2.visit) as visit,
                t1.code1, t1.type1, t2.code2, t2.type2
from (select *,
             row_number() over (partition by person, visit order by type1) as seqnum
      from @t1
     ) t1 inner join
     (select *,
             row_number() over (partition by person, visit order by type2) as seqnum
      from @t2
     ) t2
     on t1.person = t2.person and t1.visit = t2.visit and
        t1.seqnum = t2.seqnum
 union all

select coalesce(t1.person, t2.person) as person, coalesce(t1.visit, t2.visit) as visit,
                t1.code1, t1.type1, t2.code2, t2.type2
from (
      (select person, visit, MAX(seqnum) as max_rownum from (
        select person, visit, 
             row_number() over (partition by person, visit order by type1) as seqnum
      from @t1) t1_f 
      group by person, visit
     ) t1_m inner join
     (select *, row_number() over (partition by person, visit order by type1) as seqnum
       from @t1
      ) t1 
        on t1.person=t1_m.person and t1.visit=t1_m.visit
        and t1.seqnum=t1_m.max_rownum
        inner join
     (select *,
             row_number() over (partition by person, visit order by type2) as seqnum
      from @t2
     ) t2
     on t1.person = t2.person and t1.visit = t2.visit and
        t1.seqnum < t2.seqnum 
     )
 union all
 select t1.person, t1.visit, t1.code1, t1.type1, t2.code2, t2.type2
     from @t1 t1 left join @t2 t2
    on t2.person=t1.person and t2.visit=t1.visit
    where t2.code2 is null

这篇关于如何复制 SAS 合并的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆