根据公共数据字段比较不同行之间的开始/结束日期 [英] Comparing start/end dates between different rows based on common data field

查看:14
本文介绍了根据公共数据字段比较不同行之间的开始/结束日期的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

SAS 9.4

示例数据集

╔═════════╦═══════╦════════════╦══════════╗  
║ subject ║ event ║ start_date ║ end_date ║  
╠═════════╬═══════╬════════════╬══════════╣  
║ s1      ║ e1    ║ 2-Mar-19   ║ 1-Jun-19 ║  
║ s2      ║ e1    ║ 1-Mar-19   ║ 1-May-19 ║  
╚═════════╩═══════╩════════════╩══════════╝ 

预期输出:

╔═════════╦═══════╦════════════╦══════════╦════════════╗  
║ subject ║ event ║ start_date ║ end_date ║ query_text ║  
╠═════════╬═══════╬════════════╬══════════╬════════════╣   
║ s1      ║ e1    ║ 1-Mar-19   ║ 1-Jun-19 ║ error?     ║  
║ s2      ║ e1    ║ 1-Mar-19   ║ 1-May-19 ║ error?     ║  
╚═════════╩═══════╩════════════╩══════════╩════════════╝  

我想返回所有具有相同事件"并且具有较早开始日期"的一个的结束日期"大于 (>) 具有较晚开始日期的另一个的开始日期".

I want to return all that have the same 'event' and where the 'end_date' of one with the earlier 'start_date' is greater than (>) the 'start_date' of the other one with the later start_date.

这里,主题s2"的start_date"比s1"早,但s2"的end_date"比s1"的start_date"晚,所以这两个会被返回

Here, subject 's2' has an earlier 'start_date' than 's1' but 's2' has a later 'end_date' than "s1's" 'start_date', so these two would get returned

这是我目前所拥有的,但我不确定如何确保我正在比较的内容具有相同的事件":

This is what I have so far, but I'm not sure how to make sure what I'm comparing have the same 'event':

data Out_2;   
/* set relevant dataset */   
if input(compress(end_date,"/"), ??date9.) > input(compress(start_date,"/"), ??date9.);  

/* do something */  

run;

谢谢

推荐答案

在DATA步中分组处理行的方式是按key排序,然后使用BY key_field (event 在您的示例中)在 DATA 步骤中,保留组中前一行的数据,同时使用 if first.key_field<标识组的第一行和最后一行/code> 和 if last.key_field).

The way to process rows in groups in a DATA step is to sort by the key, then use BY key_field (event in your example) in the DATA step, retaining the data from the previous rows in a group, while identifying the group's first and last row using if first.key_field and if last.key_field).

如果每个事件最多有两个主题,那将很容易.如果您有更多,则必须编写重复的代码来保留第一行、第二行等行中的值,或者将数据保留到数组中.

That would be easy if you have a maximum of two subjects per event. If you have more you'd have to write duplicate code to retain the values from the first, second, etc rows, or retain the data into an array.

所以我会放弃 DATA 步骤,转而使用 SQL,这样您就可以轻松地做到这一点:

So I'd ditch the DATA step in favor of SQL, which lets you do this easily:

proc sql;
    create table result as
    select i1.*
      from example_input i1,
           example_input i2
     where i1.event = i2.event
       and i1.subject ^= i2.subject
       and (
             (i1.start_date < i2.start_date and
              i1.end_date > i2.start_date
             ) 
           or /* the other way around */
             (i2.start_date < i1.start_date and
              i2.end_date > i1.start_date
             )
           )
   ; /* untested - I don't have SAS handy */
quit;

我在输入数据上使用此 SQL 得到的输出(不是在 SAS 中,但我很确定它不应该给出不同的结果!):

The output I get with this SQL on your input data (not in SAS, but I'm pretty sure it shouldn't give different results!):

| subject | event | start_date | end_date   |
| ------- | ----- | ---------- | ---------- |
| s1      | e1    | 2019-03-02 | 2019-06-01 |
| s2      | e1    | 2019-03-01 | 2019-05-01 |

查看 DB Fiddle

这篇关于根据公共数据字段比较不同行之间的开始/结束日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆