根据 BY 组的特征保留或删除一组观察值 [英] Keeping or deleting a group of observations based on a characteristic of a BY-group

查看:20
本文介绍了根据 BY 组的特征保留或删除一组观察值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

几分钟前我回答了一个 SAS 问题,并意识到有一个概括可能比那个更有用(

I answered a SAS question a few minutes ago and realized there is a generalization that might be more useful than that one (here). I didn't see this question already in StackOverflow.

一般问题是:如何根据 BY 组的某些特征来处理和保留整个 BY 组,在查看组中的所有观察结果之前您可能不知道这些特征?

The general question is: How can you process and keep an entire BY-group based on some characteristic of the BY-group that you might not know until you have looked at all the observations in the group?

使用与上一个问题类似的输入数据:

Using input data similar to that from the earlier question:

* For some reason, we are tasked with keeping only observations that
* are in groups of ID_1 and ID_2 that contain at least one obs with
* a VALUE of 0.;
* In the following data, the following ID and ID_2 groups should be
* kept:
* A 2 (2 obs)
* B 1 (3 obs)
* B 3 (2 obs)
* B 4 (1 obs)
* The resulting dataset will have 8 observations.;
data x;
    input id $ id_2 value;
datalines;
A 1 1
A 1 1
A 1 1
A 2 0
A 2 1
B 1 0
B 1 1
B 1 3
B 2 1
B 3 0
B 3 0
B 4 0
C 2 4
;
run;

推荐答案

Double DoW 循环解决方案:

Double DoW loop solution:

data have;
    input id $ id_2 value;
datalines;
A 1 1
A 1 1
A 1 1
A 2 0
A 2 1
B 1 0
B 1 1
B 1 3
B 2 1
B 3 0
B 3 0
B 4 0
C 2 4
;
run;

data want;
do _n_ = 1 by 1 until(last.id_2);
    set have;
    by id id_2;
    flag = sum(flag,value=0);
end;
do _n_ = 1 to _n_;
    set have;
    if flag then output;
end;
drop flag;
run;

我已经使用大约 55m 行针对 point 方法对此进行了测试,发现性能没有明显差异.使用的数据集:

I've tested this against the point approach using ~55m rows and found no appreciable difference in performance. Dataset used:

data have;
do ID = 1 to 10000000;
    do id_2 = 1 to ceil(ranuni(1)*10);
        do value = floor(ranuni(2) * 5);
            output;
        end;
    end;
end;
run;

这篇关于根据 BY 组的特征保留或删除一组观察值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆