删除其观测值均不包含 SAS 中某个值的组 [英] Delete the group that none of its observation contain the certain value in SAS

查看:35
本文介绍了删除其观测值均不包含 SAS 中某个值的组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想删除没有观察到 NUM=14 的整个组

I want to delete the whole group that none of its observation has NUM=14

所以是这样的:原始数据

So something likes this: Original DATA

ID  NUM 
1  14
1  12
1  10
2  13
2  11
2  10
3  14
3  10

由于 ID=2 中没有一个包含 NUM=14,我删除了组 2.它应该是这样的:

Since none of the ID=2 contain NUM=14, I delete group 2. And it should looks like this:

ID  NUM 
1  14
1  12
1  10
3  14
3  10

这是我目前所拥有的,但似乎不起作用.

This is what I have so far, but it doesn't seem to work.

data originaldat;
set newdat;
by ID;
If first.ID then do;
        IF NUM EQ 14 then Score = 100;
        Else Score = 10;
    end;
else SCORE+1;
run; 

data newdat;
set newdat;
   If score LT 50 then delete;
run;

推荐答案

使用 proc sql 的方法是:

proc sql;
    create table newdat as
    select * 
    from originaldat
    where ID in (
        select ID 
        from originaldat
        where NUM = 14
    );
quit;

子查询 为包含观察的组选择 IDs,其中 NUM = 14.where 子句然后将所选数据仅限于这些组.

The sub query selects the IDs for groups that contain an observation where NUM = 14. The where clause then limits the selected data to only these groups.

等效的数据步骤方法是:

The equivalent data step approach would be:

/* Get all the groups that contain an observation where N = 14 */
data keepGroups;
    set originaldat;
    if NUM = 14;
    keep ID;
run;
/* Sort both data sets to ensure the data step merge works as expected */
proc sort data = originaldat;
    by ID;
run;
/* Make sure there are no duplicates values in the groups to be kept */
proc sort data = keepGroups nodupkey;
    by ID;
run;
/* 
    Merge the original data with the groups to keep and only keep records
    where an observation exists in the groups to keep dataset
*/
data newdat;
    merge 
        originaldat 
        keepGroups (in = k);
    by ID;
    if k;
run;

在两个数据集中,subsetting if 语句用于仅在满足条件时输出观察结果.在第二种情况下,k 是一个值为 1(true) 的临时变量,当从 keepGroups0(false) 否则.

In both datasets the subsetting if statement is used to only output observations when the condition is met. In the second case k is a temporary variable with value 1(true) when a value is read from keepGroups an 0(false) otherwise.

这篇关于删除其观测值均不包含 SAS 中某个值的组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆