变量检查和总结 [英] Variable check and summary out

查看:34
本文介绍了变量检查和总结的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试对数据集中的变量列表(收入、成本、利润和 vcost)进行简单检查,从每个变量中获取最大和第二大变量,检查它们的总数是否大于变量总和的 90%,如果是,则标记该变量.我还想检查最大的变量是否不大于总和的 60%.

I'm trying to do a simple check on on a list of variables in a data set (revenue, costs, profits, and vcosts) that grabs the largest and second largest from each variable, checks if their total is greater than 90% of the sum of the variable, and if so, flags that variable. I want to also check that the largest variable is not larger than 60% of the total sum.

我从这个宏中得到了一些帮助输出带有 SAS 表测试结果的表输出带有 SAS 表测试结果的表的宏 但现在我试图回答一个更基本的问题.这似乎并不难,但我无法弄清楚如何在最后设置基本表.

I got a bit of help from this Macro that outputs table with testing results of SAS tableMacro that outputs table with testing results of SAS table but now I'm trying to answer a much more basic question. This doesn't seem to hard, but I can't figure out how to setup the basic table at the end.

我知道所有的变量名称.

I know all the variable names.

这是我创建的示例数据集:https://www.dropbox.com/s/x575w5d551uu47p/dataset%20%281%29.csv?dl=0

Here is a sample dataset I've created : https://www.dropbox.com/s/x575w5d551uu47p/dataset%20%281%29.csv?dl=0

我想把这个基本表:

像这样进入另一个表:

/* Create some dummy data with three variables to assess */
data have;
    do firm = 1 to 3;
        revenue = rand("uniform");
        costs = rand("uniform");
        profits = rand("uniform");
        vcost = rand("uniform");
        output;
    end;
run;

推荐答案

基于您对上一个答案的评论.看起来 top_2_total 是 2 个最大总值的总和.为此,您需要编写一些额外的步骤.我正在使用 proc transpose 和 datastep 来获得在上一个答案中已经实现的内容.我编写了 PROC 摘要以获得前 2 个最大总值并重用数据集来创建最终答案.如果有帮助,请告诉我.

Based on your comment on the previous answer. It looks like top_2_total is sum of the 2 maximum total values. For that purpose you would need to code some extra step. I'm using proc transpose and a datastep to get what was already acheieved in the previous answer. I have coded PROC SUMMARY to get the top 2 maximum total values and reusing the dataset to create the final answer. Let me know if it helps.

data have;
    do firm = 1 to 3;
        revenue = rand("uniform");
        costs = rand("uniform");
        profits = rand("uniform");
        vcost = rand("uniform");
        output;
    end;
run;

proc transpose data=have out=want prefix=top_;
    var revenue--vcost;
run;

data want;
set want end=eof;
    array top(*) top_3-top_1;
    call sortn(of top[*]);
    total=sum(of top[*]);
run;
/* Getting the maximum 2 total values using PROC SUMMARY*/
proc summary data=want nway;
    output out=total_top_2_rec(drop=_:) idgroup(max(total) out[2](total)=);
run;

data want;
/* Loop to get the values from previous step and generate TOP_2_TOTAL variable */
if _n_=1 then set total_top_2_rec;
    top_2_total=sum(total_1,total_2);

set want;
    if sum(top_1,top_2) > 0.9  * top_2_total then Flag90=1; else Flag90=0;
    if top_1 > top_2_total * 0.6 then Flag60=1; else Flag60=0;

drop total_1 total_2;
run;

proc print data=want;run;

我在我的 PROC TRANSPOSE 之前添加了一个逻辑,您可以在其中添加变量以进行计算,其余部分由代码完成.之后代码执行者不需要手动更改.变量应作为空格分隔列表输入.

EDIT : I have added a logic before my PROC TRANSPOSE where you can add the variables to consider for the calculation and rest is done by the code. No manual changes would be required to be done by code executor after that. The variables should be entered as space delimited list.

data have;
infile 'C:\dataset (1).csv' missover dsd dlm=',' firstobs=2;
input firm v1 v2 v3;
run;

/* add/remove columns here to consider variable */
%let variable_to_consider=v1 
                          v2 
                          v3
                          ;

%let variable_to_consider=%cmpres(&variable_to_consider);
proc sql noprint;
  select count(*) into : obs_count from have;
quit;
%let obs_count=&obs_count;

proc transpose data=have out=want prefix=top_;
    var &variable_to_consider; 
run;

data want;
set want end=eof;
    array top(*) top_&obs_count.-top_1;
    x=dim(top);
    call sortn(of top[*]);
    total=sum(of top[*]);

keep total top_1 top_2 _name_;
run;

/* Getting the maximum 2 total values using PROC SUMMARY*/
proc summary data=want nway;
    output out=total_top_2_rec(drop=_:) idgroup(max(total) out[2](total)=);
run;

data want;
/* Loop to get the values from previous step and generate TOP_2_TOTAL variable */
if _n_=1 then set total_top_2_rec;
    top_2_total=sum(total_1,total_2);

set want;
    if sum(top_1,top_2) > 0.9  * top_2_total then Flag90=1; else Flag90=0;
    if top_1 > top_2_total * 0.6 then Flag60=1; else Flag60=0;

drop total_1 total_2;
run;

proc print data=want;run;

EDIT 2014-04-05 : 如前所述,我更新了逻辑并修复了问题.以下是更新后的代码.

EDIT 2014-04-05 : As discussed, i have updated the logic and fixed the issues. Below is the updated code.

data have1;
    do firm = 1 to 3;
        revenue = rand("uniform");
        costs = rand("uniform");
        profits = rand("uniform");
        vcost = rand("uniform");
        output;
    end;
run;

data have2;
infile 'dataset (1).csv' missover dsd dlm=',' firstobs=2;
input firm v1 v2 v3;
run;
/* add/remove columns here to consider variable */

%macro mymacro(input_dataset= ,output_dataset=, variable_to_consider=);

%let variable_to_consider=%cmpres(&variable_to_consider);
proc sql noprint;
  select count(*) into : obs_count from &input_dataset;
quit;
%let obs_count=&obs_count;

proc transpose data=&input_dataset out=&output_dataset prefix=top_;
    var &variable_to_consider; 
run;

data &output_dataset;
set &output_dataset end=eof;
    array top(*) top_&obs_count.-top_1;
    x=dim(top);
    call sortn(of top[*]);
    total=sum(of top[*]);

top_2_total=sum(top_1, top_2);
    if sum(top_1,top_2) > 0.9  * total then Flag90=1; else Flag90=0;
    if top_1 > total * 0.6 then Flag60=1; else Flag60=0;

keep total top_1 top_2 _name_ top_2_total total Flag60 Flag90;

run;
%mend mymacro;

%mymacro(input_dataset=have1, output_dataset=want1 ,variable_to_consider=revenue costs profits vcost)
%mymacro(input_dataset=have2, output_dataset=want2 ,variable_to_consider=v1 v2 v3 )


proc print data=want1;run;
proc print data=want2;run;

这篇关于变量检查和总结的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆