根据 BY 组的特征保留或删除一组观察值 [英] Keeping or deleting a group of observations based on a characteristic of a BY-group
问题描述
几分钟前我回答了一个 SAS 问题,并意识到有一个概括可能比那个更有用(
I answered a SAS question a few minutes ago and realized there is a generalization that might be more useful than that one (here). I didn't see this question already in StackOverflow.
一般问题是:如何根据 BY 组的某些特征来处理和保留整个 BY 组,在查看组中的所有观察结果之前您可能不知道这些特征?
The general question is: How can you process and keep an entire BY-group based on some characteristic of the BY-group that you might not know until you have looked at all the observations in the group?
使用与上一个问题类似的输入数据:
Using input data similar to that from the earlier question:
* For some reason, we are tasked with keeping only observations that
* are in groups of ID_1 and ID_2 that contain at least one obs with
* a VALUE of 0.;
* In the following data, the following ID and ID_2 groups should be
* kept:
* A 2 (2 obs)
* B 1 (3 obs)
* B 3 (2 obs)
* B 4 (1 obs)
* The resulting dataset will have 8 observations.;
data x;
input id $ id_2 value;
datalines;
A 1 1
A 1 1
A 1 1
A 2 0
A 2 1
B 1 0
B 1 1
B 1 3
B 2 1
B 3 0
B 3 0
B 4 0
C 2 4
;
run;
推荐答案
Double DoW 循环解决方案:
Double DoW loop solution:
data have;
input id $ id_2 value;
datalines;
A 1 1
A 1 1
A 1 1
A 2 0
A 2 1
B 1 0
B 1 1
B 1 3
B 2 1
B 3 0
B 3 0
B 4 0
C 2 4
;
run;
data want;
do _n_ = 1 by 1 until(last.id_2);
set have;
by id id_2;
flag = sum(flag,value=0);
end;
do _n_ = 1 to _n_;
set have;
if flag then output;
end;
drop flag;
run;
我已经使用大约 55m 行针对 point
方法对此进行了测试,发现性能没有明显差异.使用的数据集:
I've tested this against the point
approach using ~55m rows and found no appreciable difference in performance. Dataset used:
data have;
do ID = 1 to 10000000;
do id_2 = 1 to ceil(ranuni(1)*10);
do value = floor(ranuni(2) * 5);
output;
end;
end;
end;
run;
这篇关于根据 BY 组的特征保留或删除一组观察值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!