SAS总和观察不在一个组中,按组 [英] SAS sum observations not in a group, by group

查看:16
本文介绍了SAS总和观察不在一个组中,按组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据集:

data have;
   input group $ value;
   datalines;
A 4
A 3
A 2
A 1
B 1
C 1
D 2
D 1
E 1
F 1
G 2
G 1
H 1
;
run;

第一个变量是组标识符,第二个是值.

The first variable is a group identifier, the second a value.

对于每个组,我想要一个新变量sum",其中包含列中所有值的总和,但观察所在的组除外.

For each group, I want a new variable "sum" with the sum of all values in the column, exept for the group the observation is in.

我的问题是必须对近 3000 万个观察结果执行此操作,因此效率很重要.我发现使用 data step 比使用 procs 更有效.

My issue is having to do that on nearly 30 millions of observations, so efficiency matters. I found that using data step was more efficient than using procs.

最终的数据库应该是这样的:

The final database should looks like :

data want;
   input group $ value $ sum;
   datalines;
A 4 11
A 3 11
A 2 11
A 1 11
B 1 20
C 1 20
D 2 18
D 1 18
E 1 20
F 1 20
G 2 18
G 1 20
H 1 20
;
run;

您知道如何执行此操作吗?

Any idea how to perform this please?

我不知道这是否重要,但我给出的示例是我的问题的简化版本.在实际情况下,我有 2 个其他组变量,因此取整列的总和并减去组中的总和不是一个可行的解决方案.

I don't know if this matter but the example I gave is a simplified version of my issue. In the real case, I have 2 other group variable, thus taking the sum of the whole column and substract the sum in the group is not a viable solution.

推荐答案

需求

列中所有值的总和,观察所在的组除外

sum of all values in the column, except for the group the observation is in

表示必须发生两次数据传递:

indicates two passes of the data must occur:

  1. 计算 all_sum 和每个组的 group_sum
    哈希可以存储每个组的总和 -- 通过指定的 suminc: 变量计算和 .ref() 方法调用.一个变量可以累积allsum.
  2. 计算 allsum - group_sum 为组中的每一行.
    group_sum 从哈希中检索并从 中减去allsum.
  1. Compute the all_sum and each group's group_sum
    A hash can store each group's sum -- computed via a specified suminc: variable and .ref() method invocation. A variable can accumulate allsum.
  2. Compute allsum - group_sum for each row of a group.
    The group_sum is retrieved from hash and subtracted from allsum.

例子:

data want;
  if 0 then set have; * prep pdv;

  declare hash sums (suminc:'value');
  sums.defineKey('group');
  sums.defineDone();

  do while (not hash_loaded);
    set have end=hash_loaded;
    sums.ref();                * adds value to internal sum of hash data record;
    allsum + value;
  end;

  do while (not last_have);
    set have end=last_have;
    sums.sum(sum:sum);         * retrieve groups sum. Do you hear the Dragnet theme too?;
    sum = allsum - sum;        * subtract from allsum;
    output;
  end;

  stop;
run;

这篇关于SAS总和观察不在一个组中,按组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆