如何复制SAS合并 [英] How to replicate a SAS merge

查看:130
本文介绍了如何复制SAS合并的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个表,t1和t2:

I have two tables, t1 and t2:

t1
  person | visit | code1 | type1
       1       1      50      50 
       1       1      50      50 
       1       2      75      50 

t2
  person | visit | code2 | type2
       1       1      50      50 
       1       1      50      50 
       1       1      50      50 

SAS运行以下代码时:

When SAS runs the following code:

   DATA t3;
     MERGE t1 t2;
     BY person visit;

   RUN;

它生成以下数据集:

       person | visit | code1 | type1 | code2 | type2
            1       1      50      50      50      50
            1       1      50      50      50      50
            1       1      50      50      50      50
            1       2      75      50

我想用SQL复制此过程,我的想法是使用完全外部联接.除非有重复的行,否则此方法有效.当我们在上面的示例中有重复的行时,完整的外部联接将产生下表:

I want to replicate this process in SQL, and my idea was to use a full-outer-join. This works unless there are duplicate rows. When we have duplicate rows like in the above example, a full outer join produces the following table:

       person | visit | code1 | type1 | code2 | type2
            1       1      50      50      50      50
            1       1      50      50      50      50
            1       1      50      50      50      50
            1       1      50      50      50      50
            1       1      50      50      50      50
            1       1      50      50      50      50
            1       2      75      50

我想知道如何使SQl表与SAS表匹配.

I'm wondering how I would get the SQl table to match the SAS table.

推荐答案

戈登的答案很接近;但它遗漏了一点.这是它的输出:

Gordon's answer is close; but it misses one point. Here's its output:

person  visit   code1   type1   seqnum  person  visit   code2   type2   seqnum
1       1       1       1       1       1       1       1       1       1
1       1       2       2       2       1       1       2       2       2
NULL    NULL    NULL    NULL    NULL    1       1       3       3       3
1       2       1       3       1       NULL    NULL    NULL    NULL    NULL

第三行的空值是不正确的,而第四行的空值是正确的.

The third row's nulls are incorrect, while the fourth's are correct.

据我所知,在SQL中,除了将事情分解成几个查询外,没有什么真正的好方法.我认为有五种可能性:

As far as I know, in SQL there's not a really good way to do this other than splitting things up into a few queries. I think there are five possibilities:

  • 匹配者/访问者,匹配序列号
  • 匹配的人/来访者,Left有更多的后记
  • 匹配人/次访问,Right有更多的后缀
  • 左有不匹配的人/访客
  • 权利没有可比的人/访客

我认为后两个可以在一个查询中使用,但是我认为第二个和第三个必须是单独的查询.您当然可以将所有内容结合在一起.

I think the last two might be workable into one query, but I think the second and third have to be separate queries. You can union everything together, of course.

因此,这是一个示例,使用一些更适合于查看正在发生的情况的临时表.请注意,即使code1type1是多余"的,现在也要为其填充第三行.我只添加了五个条件中的三个-您在最初的示例中有三个-但其他两个条件并不难.

So here's an example, using some temporary tables that are a little more well suited to see what's going on. Note that the third row is now filled in for code1 and type1, even though those are 'extra'. I've only added three of the five criteria - the three you had in your initial example - but the other two aren't too hard.

请注意,这是SAS速度的一个示例-因为SAS具有逐行概念,即,一次可以行一次.对于大型表,SQL往往会花费更长的时间,除非可以非常整洁地对事物进行分区并且具有非常好的索引-甚至到那时,我还从未见过SQL DBA可以像SAS一样好问题类型.这当然是您必须接受的-SQL具有自己的优势,其中之一可能就是价格...

Note that this is an example of something far faster in SAS - because SAS has a row-wise concept, ie, it's capable of going one row at a time. SQL tends to take a lot longer at these, with large tables, unless it's possible to partition things very neatly and have very good indexes - and even then I've never seen a SQL DBA do anywhere near as well as SAS at some of these types of problems. That's something you'll have to accept of course - SQL has its own advantages, one of which being probably price...

这是我的示例代码.我确信它不是非常优雅,希望SQL专家之一可以对其进行改进.这是为了在SQL Server(使用表变量)中工作而编写的,假设其他东西实现了窗口化,则同样的东西应该在其他变体中进行一些更改(以使用临时表). (SAS当然不能做这件事-甚至FedSQL都实现ANSI 1999,而不是ANSI2008.)这是基于Gordon的初始查询,然后在末尾用附加位进行修改的.任何想要改善此问题的人都可以随时编辑和/或复制到新的/现有的答案中.

Here's my example code. I'm sure it's not terribly elegant, hopefully one of the SQL folk can improve it. This is written to work in SQL Server (using table variables), same thing should work with some changes (to use temporary tables) in other variants, assuming they implement windowing. (SAS of course can't do this particular thing - as even FedSQL implements ANSI 1999, not ANSI 2008.) This is based on Gordon's initial query, then modified with the additional bits at the end. Anyone who wants to improve this please feel free to edit and/or copy to a new/existing answer any bit you wish.

declare @t1 table (person INT, visit INT, code1 INT, type1 INT);
declare @t2 table (person INT, visit INT, code2 INT, type2 INT);


insert into @t1 values (1,1,1,1)
insert into @t1 values (1,1,2,2)
insert into @t1 values (1,2,1,3)

insert into @t2 values (1,1,1,1)
insert into @t2 values (1,1,2,2)
insert into @t2 values (1,1,3,3)

select coalesce(t1.person, t2.person) as person, coalesce(t1.visit, t2.visit) as visit,
                t1.code1, t1.type1, t2.code2, t2.type2
from (select *,
             row_number() over (partition by person, visit order by type1) as seqnum
      from @t1
     ) t1 inner join
     (select *,
             row_number() over (partition by person, visit order by type2) as seqnum
      from @t2
     ) t2
     on t1.person = t2.person and t1.visit = t2.visit and
        t1.seqnum = t2.seqnum
 union all

select coalesce(t1.person, t2.person) as person, coalesce(t1.visit, t2.visit) as visit,
                t1.code1, t1.type1, t2.code2, t2.type2
from (
      (select person, visit, MAX(seqnum) as max_rownum from (
        select person, visit, 
             row_number() over (partition by person, visit order by type1) as seqnum
      from @t1) t1_f 
      group by person, visit
     ) t1_m inner join
     (select *, row_number() over (partition by person, visit order by type1) as seqnum
       from @t1
      ) t1 
        on t1.person=t1_m.person and t1.visit=t1_m.visit
        and t1.seqnum=t1_m.max_rownum
        inner join
     (select *,
             row_number() over (partition by person, visit order by type2) as seqnum
      from @t2
     ) t2
     on t1.person = t2.person and t1.visit = t2.visit and
        t1.seqnum < t2.seqnum 
     )
 union all
 select t1.person, t1.visit, t1.code1, t1.type1, t2.code2, t2.type2
     from @t1 t1 left join @t2 t2
    on t2.person=t1.person and t2.visit=t1.visit
    where t2.code2 is null

这篇关于如何复制SAS合并的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆