根据变量的值生成数据总和 [英] generating sums of data according to values of a variable

查看：52 发布时间：2020/6/2 20:33:06 r aggregate

本文介绍了根据变量的值生成数据总和的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有类似的数据：

Hosp    Score   Var1    Var2    Var3
1       0       5       3       0
1       1       10      8       1
1       2       11      8       2
1       3       5       3       2
2       0       6       4       0
2       2       10      6       1
2       3       10      7       2
3       1       4       3       2
etc...

有大约800 Hosps和4个等级的因子得分（0,1,2,3）。有些Hosps没有观察到特定级别的分数

there are approx 800 Hosps and 4 levels of the factor score (0,1,2,3). Some Hosps do not have an observation for a particular levels of score

我想产生一个新的数据帧，其中变量Var1，Var2和Var3相加。因此，使用上面的数据将得出

I would like to produce a new data frame where the variables Var1, Var2 and Var3 are summed. So using the data above this would give

Hosp    Var1    Var2    Var3
1       32      22      5
2       26      17      3
3       4       3       2

在SQL中这是微不足道的（我猜是也是在R中！），但到目前为止，我发现的唯一方法是使用循环，该循环在Hosp每次更改时添加一个新行，并在不更改时递增变量（当然，数据在Hosp上排序）。

This would be trivial in SQL (and I guess it is in R too!) but the only way I have found so far is with a loop, which adds a new row each time Hosp changes and increments the variables when it doesn't change (of course the data is sorted on Hosp).

有一种巧妙的方法吗？

推荐答案

使用 aggregate（）。假设您的 data.frame 被命名为 temp，请使用：

Use aggregate(). Assuming your data.frame is named "temp", use:

aggregate(list(Var1 = temp$Var1, Var2 = temp$Var2, Var3 = temp$Var3), 
          list(Hosp = temp$Hosp), sum)
#   Hosp Var1 Var2 Var3
# 1    1   31   22    5
# 2    2   26   17    3
# 3    3    4    3    2

要使用 slick 的方式，请对 aggregate（）使用公式表示法：

For a slick way to do this, use the formula notation for aggregate():

aggregate(cbind(Var1, Var2, Var3) ~ Hosp, temp, sum)

但是，请注意，使用默认值对待 NA 值的方式有所不同 data.frame 方法和公式方法。

Note, however, that there is a difference in how NA values are treated with the default data.frame method and the formula method.

您在最初的问题中使用SQL提到过。我对SQL没有太多的经验，但是也有 sqldf 包，您可以使用该包执行以下操作：

You mentioned in your original question using SQL. I don't have much experience with SQL, but there is also the sqldf package, with which you can do the following:

sqldf("select Hosp, 
      sum(Var1) `Var1`, sum(Var2) `Var2`, sum(Var3) `Var3` 
      from temp 
      group by Hosp")
#   Hosp Var1 Var2 Var3
# 1    1   31   22    5
# 2    2   26   17    3
# 3    3    4    3    2

这篇关于根据变量的值生成数据总和的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

根据变量的值生成数据总和 [英] generating sums of data according to values of a variable

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

根据变量的值生成数据总和 [英] generating sums of data according to values of a variable

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭