根据变量的值生成数据总和 [英] generating sums of data according to values of a variable

查看:52
本文介绍了根据变量的值生成数据总和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有类似的数据:

Hosp    Score   Var1    Var2    Var3
1       0       5       3       0
1       1       10      8       1
1       2       11      8       2
1       3       5       3       2
2       0       6       4       0
2       2       10      6       1
2       3       10      7       2
3       1       4       3       2
etc...

有大约800 Hosps和4个等级的因子得分(0,1,2,3)。有些Hosps没有观察到特定级别的分数

there are approx 800 Hosps and 4 levels of the factor score (0,1,2,3). Some Hosps do not have an observation for a particular levels of score

我想产生一个新的数据帧,其中变量Var1,Var2和Var3相加。因此,使用上面的数据将得出

I would like to produce a new data frame where the variables Var1, Var2 and Var3 are summed. So using the data above this would give

Hosp    Var1    Var2    Var3
1       32      22      5
2       26      17      3
3       4       3       2

在SQL中这是微不足道的(我猜是也是在R中!),但到目前为止,我发现的唯一方法是使用循环,该循环在Hosp每次更改时添加一个新行,并在不更改时递增变量(当然,数据在Hosp上排序)。

This would be trivial in SQL (and I guess it is in R too!) but the only way I have found so far is with a loop, which adds a new row each time Hosp changes and increments the variables when it doesn't change (of course the data is sorted on Hosp).

有一种巧妙的方法吗?

推荐答案

使用 aggregate()。假设您的 data.frame 被命名为 temp,请使用:

Use aggregate(). Assuming your data.frame is named "temp", use:

aggregate(list(Var1 = temp$Var1, Var2 = temp$Var2, Var3 = temp$Var3), 
          list(Hosp = temp$Hosp), sum)
#   Hosp Var1 Var2 Var3
# 1    1   31   22    5
# 2    2   26   17    3
# 3    3    4    3    2

要使用 slick 的方式,请对 aggregate()使用公式表示法:

For a slick way to do this, use the formula notation for aggregate():

aggregate(cbind(Var1, Var2, Var3) ~ Hosp, temp, sum)

但是,请注意,使用默认值对待 NA 值的方式有所不同 data.frame 方法和公式方法。

Note, however, that there is a difference in how NA values are treated with the default data.frame method and the formula method.

您在最初的问题中使用SQL提到过。我对SQL没有太多的经验,但是也有 sqldf 包,您可以使用该包执行以下操作:

You mentioned in your original question using SQL. I don't have much experience with SQL, but there is also the sqldf package, with which you can do the following:

sqldf("select Hosp, 
      sum(Var1) `Var1`, sum(Var2) `Var2`, sum(Var3) `Var3` 
      from temp 
      group by Hosp")
#   Hosp Var1 Var2 Var3
# 1    1   31   22    5
# 2    2   26   17    3
# 3    3    4    3    2

这篇关于根据变量的值生成数据总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆