根据变量的值生成数据总和 [英] generating sums of data according to values of a variable
问题描述
我有类似的数据:
Hosp Score Var1 Var2 Var3
1 0 5 3 0
1 1 10 8 1
1 2 11 8 2
1 3 5 3 2
2 0 6 4 0
2 2 10 6 1
2 3 10 7 2
3 1 4 3 2
etc...
有大约800 Hosps和4个等级的因子得分(0,1,2,3)。有些Hosps没有观察到特定级别的分数
there are approx 800 Hosps and 4 levels of the factor score (0,1,2,3). Some Hosps do not have an observation for a particular levels of score
我想产生一个新的数据帧,其中变量Var1,Var2和Var3相加。因此,使用上面的数据将得出
I would like to produce a new data frame where the variables Var1, Var2 and Var3 are summed. So using the data above this would give
Hosp Var1 Var2 Var3
1 32 22 5
2 26 17 3
3 4 3 2
在SQL中这是微不足道的(我猜是也是在R中!),但到目前为止,我发现的唯一方法是使用循环,该循环在Hosp每次更改时添加一个新行,并在不更改时递增变量(当然,数据在Hosp上排序)。
This would be trivial in SQL (and I guess it is in R too!) but the only way I have found so far is with a loop, which adds a new row each time Hosp changes and increments the variables when it doesn't change (of course the data is sorted on Hosp).
有一种巧妙的方法吗?
推荐答案
使用 aggregate()
。假设您的 data.frame
被命名为 temp,请使用:
Use aggregate()
. Assuming your data.frame
is named "temp", use:
aggregate(list(Var1 = temp$Var1, Var2 = temp$Var2, Var3 = temp$Var3),
list(Hosp = temp$Hosp), sum)
# Hosp Var1 Var2 Var3
# 1 1 31 22 5
# 2 2 26 17 3
# 3 3 4 3 2
要使用 slick
的方式,请对 aggregate()
使用公式表示法:
For a slick
way to do this, use the formula notation for aggregate()
:
aggregate(cbind(Var1, Var2, Var3) ~ Hosp, temp, sum)
但是,请注意,使用默认值对待 NA
值的方式有所不同 data.frame
方法和公式
方法。
Note, however, that there is a difference in how NA
values are treated with the default data.frame
method and the formula
method.
您在最初的问题中使用SQL提到过。我对SQL没有太多的经验,但是也有 sqldf
包,您可以使用该包执行以下操作:
You mentioned in your original question using SQL. I don't have much experience with SQL, but there is also the sqldf
package, with which you can do the following:
sqldf("select Hosp,
sum(Var1) `Var1`, sum(Var2) `Var2`, sum(Var3) `Var3`
from temp
group by Hosp")
# Hosp Var1 Var2 Var3
# 1 1 31 22 5
# 2 2 26 17 3
# 3 3 4 3 2
这篇关于根据变量的值生成数据总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!