将一个变量的水平转换为其他变量 [英] Transforming levels of one variable into other variables
问题描述
我有一个看起来像这样的数据集:
I have a dataset that looks something like this:
IDnum State Product Consumption
123 MI A 30
123 MI B 20
123 MI C 45
456 NJ A 15
456 NJ D 10
789 MI B 60
... ... ... ...
我想创建一个新数据集,其中每个 IDnum 有一行,每个不同产品都有一个新的虚拟变量(在我的真实数据集中,我有近 1000 个产品),以及相关的消费.它看起来像这些行中的东西
And i would like to create a new dataset, where i have one row for each IDnum, and a new dummy variable for each different product (in my real dataset i have close to 1000 products), along with it's associated consumption. It would look like something in these lines
IDnum State Prod.A Cons.A Prod.B Cons.B Prod.C Cons.C Prod.D Cons.D
123 MI yes 30 yes 20 yes 45 no -
456 NJ yes 15 no - no - yes 10
789 MI no - yes 60 no - no -
... ... ... ... ... ... ... ... ... ...
某些变量如State"在同一个IDnum内不会发生变化,但是原银行的每一行都相当于一次购买,因此同一个IDnum的product"和consumption"变量发生了变化.我希望我的新数据集在一行中显示每个客户的所有消费习惯,但到目前为止我失败了.
Some variables like "State" doesn't change within the same IDnum, but each row in the original bank are equivalent to one purchase, hence the change in the "product" and "consumption" variables for the same IDnum. I would like that my new dataset showed all the consumption habits of each costumer in one single row, but so far i have failed.
任何帮助将不胜感激.
推荐答案
没有 yes/no 变量,这真的很容易:
Without yes/no variables, it's really easy:
data input;
length State $2 Product $1;
input IDnum State Product Consumption;
cards;
123 MI A 30
123 MI B 20
123 MI C 45
456 NJ A 15
456 NJ D 10
789 MI B 60
;
run;
proc transpose data=input out=output(drop=_NAME_) prefix=Cons_;
var Consumption;
id Product;
by IDnum State;
run;
添加是/否字段:
proc sql;/* from column names or alternatively
create it from source data directly if not taking too long */
create table work.products as
select scan(name, 2, '_') as product length=1
from dictionary.columns
where libname='WORK' and memname='OUTPUT'
and upcase(name) like 'CONS_%';
quit;
filename vars temp;/* write a temp file containing variable definitions
in desired order */
data _null_;
set work.products end=last;
file vars;
length str $40;
if _N_ = 1 then put 'LENGTH ';
str = catt('Prod_', product, ' $3');
put str;
str = catt('Cons_', product, ' 8');
put str;
if last then put ';';
run;
options source2;
data output2;
length IdNum 8 State $2;
%include vars;
set output;
array prod{*} Prod_:;
array cons{*} Cons_:;
drop i;
do i=1 to dim(prod);
if coalesce(cons(i), 0) ne 0 then prod(i)='yes';
else prod(i)='no';
end;
run;
这篇关于将一个变量的水平转换为其他变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!