SAS 中的跟踪 ID [英] Tracking ID in SAS

查看:84
本文介绍了SAS 中的跟踪 ID的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 SAS 问题.我有一个大型数据集,其中包含时间序列中每年的唯一 ID 和一堆变量.一些 ID 存在于整个时间序列中,一些新 ID 被添加,一些旧 ID 被删除.

I have a SAS question. I have a large dataset containing unique ID's and a bunch of variables for each year in a time series. Some ID's are present throughout the entire timeseries, some new ID's are added and some old ID's are removed.

ID    Year    Var3    Var4
1     2015    500     200
1     2016    600     300
1     2017    800     100
2     2016    200     100
2     2017    100     204
3     2015    560     969
3     2016    456     768
4     2015    543     679
4     2017    765     534

从上表可以看出,ID 1 存在于所有三年(2015-2017)中,ID 2 存在于 2016 年及以后,ID 3 在 2017 年被删除,ID 4 在 2015 年存在,被删除在 2016 年,然后在 2017 年再次出现.

As can be seen from the table above, ID 1 is present in all three years (2015-2017), ID 2 is present from 2016 and onwards, ID 3 is removed in 2017 and ID 4 is present in 2015, removed in 2016 and then present again in 2017.

我想知道在任何给定年份哪些 ID 是新的,哪些被删除,同时保留所有数据.例如.一个新表,其中包含 ID 是新的和已删除的指标.此外,获得在给定年份中添加/删除多少 ID' 的频率以及它们的Var3"和Var4"的总和会很好.你有什么建议吗?

I would like to know which ID's are new and which are removed in any given year, whilst keeping all the data. Eg. a new table with indicators for which ID's are new and which are removed. Furthermore, it would be nice to get a frequency of how many ID' are added/removed in a given year and the sum og their "Var3" and "Var4". Do you have any suggestions how to do that?

**************** 更新 ******************

************* UPDATE ******************

好的,所以我尝试了以下程序:

Okay, so I tried the following program:

**** Addition to suggested code ****;
options validvarname=any;

proc sql noprint;
create table years as
select distinct year
from have;

create table ids as
select distinct id
from have;

create table all_id_years as
select a.id,  b.year
from ids as a,
years as b
order by id, year;

create table indicators as
select coalesce(a.id,b.id) as id,
coalesce(a.year,b.year) as year,
coalesce(a.id/a.id,0) as indicator
from have as a
full join
all_id_years as b
on a.id = b.id
and a.year = b.year
order by id, year
;
quit;

现在这将为我提供一个仅包含 2017 年新 ID 的表格:

Now this will provide me with a table that only contains the ID's that are new in 2017:

data new_in_17;
set indicators;
where ('2016'n=0) and ('2017'n=1);
run;

我现在可以合并这个表来添加 var3 和 var4:

I can now merge this table to add var3 and var4:

data new17;
merge new_in_17(in=x1) have(in=x2);
by id;
if x1=x2;
run;

现在我可以找到2017年新ID的频率以及var3和var4的总和:

Now I can find the frequence of new ID's in 2017 and the sum of var3 and var4:

proc means data=new17 noprint;
var var3 var4;
where year in (2017);
output out=sum_var_freq_new sum(var3)=sum_var3 sum(var4)=sum_var4;
run;

这给了我我需要的输出.但是,我想要在 2016 年和 2017 年之间消失"的 ID 的等效输出,可以从:

This gives me the output I need. However, I would like the equivalent output for the ID's that are "gone" between 2016 and 2017 which can be made from:

data gone_in_17;
set indicators;
where ('2016'n=1) and ('2017'n=0);
run;

data gone17;
merge gone_in_17(in=x1) have(in=x2);
by id;
if x1=x2;
run;

proc means data=gone17 noprint;
var var3 var4;
where year in (2016);
output out=sum_var_freq_gone sum(var3)=sum_var3 sum(var4)=sum_var4;
run;

最终的结果应该是将两个表sum_var_freq_new"和sum_var_freq_gone"组合成一张表.此外,我每个新年都需要这张桌子,所以我目前的方法效率很低.你们对如何有效地实现这一目标有什么建议吗?

The end result should be a combination of the two tables "sum_var_freq_new" and "sum_var_freq_gone" into one table. Furthermore, I need this table for every new year, so my current approach is very inefficient. Do you guys have any suggestions how to achieve this efficiently?

推荐答案

这里有一种更有效的方法来执行此操作,还可以为您提供汇总值.

Here is a more efficient way of doing this and also giving you the summary values.

首先是一点 SQL 魔法.创建年份和 ID 的叉积,然后将其加入您必须创建指标的表中;

First a little SQL magic. Create the cross product of years and IDs, then join that to the table you have to create an indicator;

proc sql noprint;
/*All Years*/
create table years as
select distinct year
    from have;

/*All IDS*/
create table ids as
select distinct id
    from have;

/*All combinations of ID/year*/
create table all_id_years as
select a.id,  b.year
    from ids as a,
         years as b
    order by id, year;

/*Original data with rows added for missing years.  Indicator=1 if it*/
/*existed prior, 0 if not.*/
create table indicators as
select coalesce(a.id,b.id) as id,
       coalesce(a.year,b.year) as year,
       coalesce(a.id/a.id,0) as indicator
    from have as a
      full join
         all_id_years as b
      on a.id = b.id
       and a.year = b.year
    order by id, year
    ;
quit;

现在转置那个.

proc transpose data=indicators out=indicators(drop=_name_);
by id;
id year;
var indicator;
run;

创建总和.如果需要,您还可以在此处添加其他汇总统计信息:

Create the sums. You could also add other summary stats if you wanted here:

proc summary data=have;
by id;
var var3 var4;
output out=summary sum=;
run;

合并指标和汇总值:

data want;
merge indicators summary(keep=id var3 var4);
by id;
run;

这篇关于SAS 中的跟踪 ID的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆