SAS总和观测值不在一个组中,按组 [英] SAS sum observations not in a group, by group

查看:122
本文介绍了SAS总和观测值不在一个组中,按组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据集:

data have;
   input group $ value;
   datalines;
A 4
A 3
A 2
A 1
B 1
C 1
D 2
D 1
E 1
F 1
G 2
G 1
H 1
;
run;

第一个变量是组标识符,第二个是值。

The first variable is a group identifier, the second a value.

对于每个组,我想要一个新的变量 sum,其列中所有值的总和为观察所在的组。

For each group, I want a new variable "sum" with the sum of all values in the column, exept for the group the observation is in.

我的问题是必须对近3000万个观测值执行此操作,因此效率至关重要。
我发现使用数据步骤比使用procs更有效。

My issue is having to do that on nearly 30 millions of observations, so efficiency matters. I found that using data step was more efficient than using procs.

最终数据库应如下所示:

The final database should looks like :

data want;
   input group $ value $ sum;
   datalines;
A 4 11
A 3 11
A 2 11
A 1 11
B 1 20
C 1 20
D 2 18
D 1 18
E 1 20
F 1 20
G 2 18
G 1 20
H 1 20
;
run;

有人知道如何执行此操作吗?

Any idea how to perform this please?

编辑:我不知道这件事是否重要,但是我举的例子只是我问题的简化版本。在实际情况下,我还有另外两个组变量,因此取整列的总和减去该组中的总和不是可行的解决方案。

I don't know if this matter but the example I gave is a simplified version of my issue. In the real case, I have 2 other group variable, thus taking the sum of the whole column and substract the sum in the group is not a viable solution.

推荐答案

要求


列中所有值的总和,但观察值所在的组除外

sum of all values in the column, except for the group the observation is in

表示必须发生两次数据传递:

indicates two passes of the data must occur:


  1. 计算 all_sum 和每个组的 group_sum
    哈希可以存储每个组的总和-通过指定的 suminc:变量和 .ref()方法调用。变量可以累积 allsum

  2. 计算 allsum - group_sum 用于组中的每一行。
    group_sum 从哈希中检索并从 allsum <中减去/ code>。

  1. Compute the all_sum and each group's group_sum
    A hash can store each group's sum -- computed via a specified suminc: variable and .ref() method invocation. A variable can accumulate allsum.
  2. Compute allsum - group_sum for each row of a group.
    The group_sum is retrieved from hash and subtracted from allsum.

示例:

data want;
  if 0 then set have; * prep pdv;

  declare hash sums (suminc:'value');
  sums.defineKey('group');
  sums.defineDone();

  do while (not hash_loaded);
    set have end=hash_loaded;
    sums.ref();                * adds value to internal sum of hash data record;
    allsum + value;
  end;

  do while (not last_have);
    set have end=last_have;
    sums.sum(sum:sum);         * retrieve groups sum. Do you hear the Dragnet theme too?;
    sum = allsum - sum;        * subtract from allsum;
    output;
  end;

  stop;
run;

这篇关于SAS总和观测值不在一个组中,按组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆