在SAS中选择唯一记录的有效方法 [英] Efficient way to select unique records in SAS
问题描述
数据集如下所示:
Code Type Rating
0001 NULL 1
0002 NULL 1
0003 NULL 1
0003 PA 1 3
0004 NULL 1
0004 PB 1 2
0005 AC 1 3
0005 NULL 6
0006 AC 1 2
我希望输出数据集看起来像
I want the output dataset looks like
Code Type Rating
0001 NULL 1
0002 NULL 1
0003 PA 1 4
0004 PB 1 3
0005 AC 1 9
0006 AC 1 2
对于每个 Code
,Type
最多有两个值.我想通过对 Rating
求和来选择唯一的 Code
.但问题是,对于Type
,如果它只有一个值,则将其值传递给输出数据集.如果 is 有两个值(一个必须是 NULL
),则将一个不等于 NULL
的值传递给输出数据集.
For each Code
, Type
has at most two values. I want to select the unique Code
by summing Rating
. But the problem is, for Type
, if it has only one value, the passes its value to output dataset. If is has two values (one has to be NULL
), then passes the one not equals to NULL
to output dataset.
观察总数N>100,000,000
.那么有没有什么棘手的方法来实现这一目标?
The total number of observation N>100,000,000
. So is there any tricky way to achieve this?
推荐答案
在一个 SQL 步骤中完成也很容易.只需使用 CASE...WHEN...END 删除 NULL 和 MAX 即可获得非空值.
It's pretty easy to do in one SQL step as well. Just use a CASE...WHEN...END to remove the NULLs and a MAX to then get the non-null value.
data have;
input @1 Code 4.
@9 Type $4.
@19 Rating 1.;
datalines;
0001 NULL 1
0002 NULL 1
0003 NULL 1
0003 PA 1 3
0004 NULL 1
0004 PB 1 2
0005 AC 1 3
0005 NULL 6
0006 AC 1 2
;;;;
run;
proc sql;
create table want as
select code,
max(case type when 'NULL' then '' else type end) as type,
sum(Rating) as rating
from have
group by code;
quit;
如果您想要返回 NULL,那么您需要将选择包装在 select 代码中,case type when ' ' then 'NULL' else type end as type, rating from ( ... );代码>,但我建议将它们留空.
If you want the NULLs back, then you need to wrap the select in a select code, case type when ' ' then 'NULL' else type end as type, rating from ( ... );
, though I would suggest leaving them blank.
这篇关于在SAS中选择唯一记录的有效方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!