拆分 SAS 数据集 [英] Split SAS dataset

查看:96
本文介绍了拆分 SAS 数据集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个如下所示的 SAS 数据集:

I have a SAS dataset that looks like this:

id | dept | ...
1    A
2    A
3    A
4    A
5    A
6    A
7    A
8    A
9    B
10   B
11   B
12   B
13   B

每个观察代表一个人.

我想将数据集拆分为团队"数据集,每个数据集最多可以有 3 个观察.

I would like to split the dataset into "team" datasets, each dataset can have a maximum of 3 observations.

对于上面的示例,这意味着为部门 A 创建 3 个数据集(其中 2 个数据集将包含 3 个观察值,而第三个数据集将包含 2 个观察值).以及 B 部门的 2 个数据集(1 个包含 3 个观测值,另一个包含 2 个观测值).

For the example above this would mean creating 3 datasets for dept A (2 of these datasets would contain 3 observations and the third dataset would contain 2 observations). And 2 datasets for dept B (1 containing 3 observations and the other containing 2 observations).

像这样:

第一个数据集(deptA1):

First dataset (deptA1):

id | dept | ...
1    A
2    A
3    A

第二个数据集 (deptA2)

Second dataset (deptA2)

id | dept | ...
4    A
5    A
6    A

第三个数据集 (deptA3)

Third dataset (deptA3)

id | dept | ...
7    A
8    A

第四个数据集(deptB1)

Fourth dataset (deptB1)

id | dept | ...
9    B
10   B
11   B

第五个数据集(deptB2)

Fifth dataset (deptB2)

id | dept | ...
12   B
13   B

我使用的完整数据集包含超过 50 个部门的数千个观察结果.我可以计算出每个部门需要多少数据集,我认为宏是最好的方法,因为所需的数据集数量是动态的.但我无法弄清楚创建数据集的逻辑,以便它们最多有 3 个观察值.任何帮助表示赞赏.

The full dataset I'm using contains thousands of observations with over 50 depts. I can work out how many datasets per dept are required and I think a macro is the best way to go as the number of datasets required is dynamic. But I can't figure out the logic to create the datasets so that they have have a maximum of 3 observations. Any help appreciated.

推荐答案

另一个版本.与 DavB 版本相比,它只处理一次输入数据,并在单个数据步骤中将其拆分为多个表.另外,如果需要更复杂的拆分规则,可以在数据步视图WORK.SOURCE_PREP中实现.

Another version. Compared to DavB version, it only processes input data once and splits it into several tables in single datastep. Also if more complex splitting rule is required, it can be implemented in datastep view WORK.SOURCE_PREP.

data WORK.SOURCE;
infile cards;
length ID 8 dept $1;
input ID dept;
cards;
1    A
2    A
3    A
4    A
5    A
6    A
7    A
8    A
9    B
10   B
11   B
12   B
13   B
14   C
15   C
16   C
17   C
18   C
19   C
20   C
;
run;

proc sort data=WORK.SOURCE;
by dept ID;
run;

data  WORK.SOURCE_PREP / view=WORK.SOURCE_PREP;
set WORK.SOURCE;
by dept;
length table_name $32;

if first.dept then do;
    count = 1;
    table = 1;
end;
else count + 1;

if count > 3 then do;
    count = 1;
    table + 1;
end;
/* variable TABLE_NAME to hold table name */
TABLE_NAME = catt('WORK.', dept, put(table, 3. -L));
run;

/* prepare list of tables */
proc sql noprint;
create table table_list as
select distinct TABLE_NAME from WORK.SOURCE_PREP where not missing(table_name)
;
%let table_cnt=&sqlobs;
select table_name into :table_list separated by ' ' from table_list;
select table_name into :tab1 - :tab&table_cnt from table_list;
quit;

%put &table_list;

%macro loop_when(cnt, var);
    %do i=1 %to &cnt;
        when ("&&&var.&i") output &&&var.&i;
    %end;
%mend;

data &table_list;
set WORK.SOURCE_PREP;
    select (TABLE_NAME);
        /* generate OUTPUT statements */
        %loop_when(&table_cnt, tab)
    end;
run;

这篇关于拆分 SAS 数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆