SAS/PROC SQL - 只要有重复就删除 BY 组中的所有观察(不仅仅是删除重复) [英] SAS/PROC SQL - remove ALL observations in BY group as long as there are duplications (not just remove the duplications)
问题描述
我是 SAS 的新手,如果他们满足两个条件,我将尝试删除组.我目前有这个数据集:
I am new to SAS and I am trying to remove groups if they fulfil two conditions. I currently have this data set:
ID ID_2 ID_3;
A 1 1;
A 1 1;
A 1 1;
A 2 0;
A 2 1;
B 3 0;
B 3 0;
我按 ID
和 ID_2
分组.
我想删除 by 组中的所有条目,只要 (1) 所有三个变量都存在重复 - 我不只是想删除重复项,我想删除整个组 AND (2)这种重复涉及每个组中所有行的 ID_3 中的值1".
I want to remove ALL entries in the by groups as long as (1) there exists duplication across all three variables - I don't just want to remove the duplicates, I would like to remove the entire group AND (2) this duplication involves value '1' in ID_3 across all rows in each by group.
换句话说,我想要的结果是:
In other words, the outcome I want is:
ID ID_2 ID_3;
A 2 0;
A 2 1;
B 3 0;
B 3 0;
我为此至少花了 5 个小时,并尝试了各种方法:
I have spent at least 5 hours on this and I have tried various methods:
首先.最后.(这并不能保证 by group 中的所有观察都匹配)
first. and last. (this does not guarantee that all observations in the by group match)
nodup(此方法仅删除重复项 - 我什至想删除组的第一行)
nodup (this method only removes the duplicates - I would like to remove even the first row of the group)
滞后(再次,组的第一行停留,这不是我想要的)
lag (again, the first row of the group stays which is not what I want)
我也愿意使用 proc sql.非常感谢您的任何意见,在此先感谢您!
I am open to using proc sql as well. Would really appreciate any input at all, thank you in advance!
推荐答案
我相信这会实现你想要的.我想可以调整逻辑以使其更清晰一些,但在我测试时它起作用了.
I believe this will accomplish what you want. The logic could be tweaked to be a little more clear, I guess, but it worked when I tested it.
data x;
input id $ id_2 id_3;
cards;
A 1 1
A 1 1
A 1 1
A 2 0
A 2 1
B 3 0
B 3 0
;
run;
* I realize the data are already sorted, but I think it is better
* not to assume they are.;
proc sort data=x;
by id id_2 id_3;
run;
* It is helpful to create a dataset for the duplicates as well as the
* unduplicated observations.;
data nodups
dups
;
set x;
by id id_2 id_3;
* When FIRST.ID_3 and LAST.ID_3 at the same time, there is only
* one obs in the group, so keep it;
if first.id_3 and last.id_3
then output nodups;
* Otherwise, we know we have more than one obs. According to
* the OP, we keep them, too, unless ID_3 = 1;
else do;
if id_3 = 1
then output dups;
else output nodups;
end;
run;
这篇关于SAS/PROC SQL - 只要有重复就删除 BY 组中的所有观察(不仅仅是删除重复)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!