将一个变量的水平转换为其他变量 [英] Transforming levels of one variable into other variables

查看:48
本文介绍了将一个变量的水平转换为其他变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看起来像这样的数据集:

I have a dataset that looks something like this:

IDnum    State    Product    Consumption
123      MI       A          30
123      MI       B          20
123      MI       C          45
456      NJ       A          15
456      NJ       D          10
789      MI       B          60
...      ...      ...        ...

我想创建一个新数据集,其中每个 IDnum 有一行,每个不同产品都有一个新的虚拟变量(在我的真实数据集中,我有近 1000 个产品),以及相关的消费.它看起来像这些行中的东西

And i would like to create a new dataset, where i have one row for each IDnum, and a new dummy variable for each different product (in my real dataset i have close to 1000 products), along with it's associated consumption. It would look like something in these lines

IDnum   State   Prod.A   Cons.A   Prod.B   Cons.B   Prod.C   Cons.C   Prod.D   Cons.D
123     MI      yes      30       yes      20       yes      45       no       -
456     NJ      yes      15       no       -        no       -        yes      10
789     MI      no       -        yes      60       no       -        no       -
...     ...     ...      ...      ...      ...      ...      ...      ...      ...

某些变量如State"在同一个IDnum内不会发生变化,但是原银行的每一行都相当于一次购买,因此同一个IDnum的product"和consumption"变量发生了变化.我希望我的新数据集在一行中显示每个客户的所有消费习惯,但到目前为止我失败了.

Some variables like "State" doesn't change within the same IDnum, but each row in the original bank are equivalent to one purchase, hence the change in the "product" and "consumption" variables for the same IDnum. I would like that my new dataset showed all the consumption habits of each costumer in one single row, but so far i have failed.

任何帮助将不胜感激.

推荐答案

没有 yes/no 变量,这真的很容易:

Without yes/no variables, it's really easy:

data input;
length State $2 Product $1;
input IDnum    State    Product    Consumption;
cards;
123      MI       A          30
123      MI       B          20
123      MI       C          45
456      NJ       A          15
456      NJ       D          10
789      MI       B          60
;
run;

proc transpose data=input out=output(drop=_NAME_) prefix=Cons_;
  var Consumption;
  id Product;
  by IDnum State;
run;

添加是/否字段:

    proc sql;/* from column names or alternatively 
                create it from source data directly if not taking too long */
    create table work.products as
        select scan(name, 2, '_') as product length=1
        from dictionary.columns 
                    where libname='WORK' and memname='OUTPUT' 
                      and  upcase(name) like 'CONS_%';
    quit;

    filename vars temp;/* write a temp file containing variable definitions
                       in desired order */
    data _null_;
        set  work.products end=last;
        file vars;
                    length str $40;
        if _N_ = 1 then put 'LENGTH ';
        str = catt('Prod_', product, ' $3');
        put str;
        str = catt('Cons_', product, ' 8');
        put str;
        if last then put ';';
    run;

    options source2;
    data output2;
        length IdNum 8 State $2;
        %include vars;
        set output;
        array prod{*} Prod_:;
        array cons{*} Cons_:;
        drop i;
        do i=1 to dim(prod);
            if coalesce(cons(i), 0) ne 0 then prod(i)='yes';
            else prod(i)='no';
        end;
    run;

这篇关于将一个变量的水平转换为其他变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆