来自数据集的 SAS 汇总统计量 [英] SAS summary statistic from a dataset

查看:44
本文介绍了来自数据集的 SAS 汇总统计量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

数据集如下所示:

colx      coly    colz  
0         1       0      
0         1       1      
0         1       0       

所需输出:

Colname      value    count

colx         0        3
coly         1        3
colz         0        2
colz         1        1

以下代码完美运行...

The following code works perfectly...

ods output onewayfreqs=outfreq;

proc freq data=final;
  tables colx coly colz / nocum nofreq;
run;

data freq;
  retain colname column_value;
  set outfreq;
  colname = scan(tables, 2, ' ');
  column_Value = trim(left(vvaluex(colname)));
  keep colname column_value frequency percent;
run;

...但我认为这效率不高.假设我有 1000 列,在所有 1000 列上运行 prof freq 效率不高.除了使用 proc freq 来完成我想要的输出之外,还有其他有效的方法吗?

... but I believe that's not efficient. Say I have 1000 columns, running prof freq on all 1000 columns is not efficient. Is there any other efficient way with out using the proc freq that accomplishes my desired output?

推荐答案

计算频率计数的最有效机制之一是通过 suminc 标签设置用于引用计数的哈希对象.

One of the most efficient mechanisms for computing frequency counts is through a hash object set up for reference counting via the suminc tag.

哈希对象 - 维护密钥摘要"的 SAS 文档演示了用于单个变量的技术.下面的示例更进一步,计算数组中指定的每个变量.suminc:'one' 指定每次使用 ref 都会将 one 的值添加到内部引用和.在迭代输出的不同键时,通过 sum 方法提取频率计数.

The SAS documentation for "Hash Object - Maintaining Key Summaries" demonstrates the technique for a single variable. The following example goes one step further and computes for each variable specified in an array. The suminc:'one' specifies that each use of ref will add the value of one to an internal reference sum. While iterating over the distinct keys for output, the frequency count is extracted via the sum method.

* one million data values;

data have;
  array v(1000);
  do row = 1 to 1000;
    do index = 1 to dim(v);
      v(index) = ceil(100*ranuni(123));
    end;
    output;
  end;
  keep v:;
  format v: 4.;
run;

* compute frequency counts via .ref();    

data freak_out(keep=name value count);
  length name $32 value 8;

  declare hash bins(ordered:'a', suminc:'one');
  bins.defineKey('name', 'value');
  bins.defineData('name', 'value');
  bins.defineDone();

  one = 1;

  do until (end_of_data);
    set have end=end_of_data;
    array v v1-v1000;
    do index = 1 to dim(v);
      name = vname(v(index));
      value = v(index);
      bins.ref();
    end;
  end;

  declare hiter out('bins');
  do while (out.next() = 0);
    bins.sum(sum:count);
    output;
  end;
run;

注意 Proc FREQ 使用标准语法,变量可以是字符和数字的混合,并且有许多通过选项指定的附加功能.

Note Proc FREQ uses standard grammars, variables can be a mixed of character and numeric, and has lots of additional features that are specified through options.

这篇关于来自数据集的 SAS 汇总统计量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆