R通过大量列进行聚合 [英] R aggregate by large number of columns

查看:81
本文介绍了R通过大量列进行聚合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个约40列的数据框(df),我想使用4列的总和进行汇总。在我要求和的4个值之外,第1列中的每个唯一值对应于其余各列中的相同值,并且我想将所有列都保留在聚合数据帧中。有什么方法可以在by = list()部分中指定列,而不必显式键入所有列?例如,如果我知道我想对列字段加1到36列。我已经尝试过

I have a data frame (df) that has about 40 columns, and I want to aggregate using a sum on 4 of the columns. Outside of the 4 I want to sum, each unique value in column 1 corresponds to identical values across the rest of the columns, and I want to keep all the columns in the aggregated data frame. Is there any way I can specify the columns in the by = list() portion without having to type them all explicitly? For example, if I knew I wanted to sum column "field" by columns 1-36. I've tried

aggregate(df$field, by = list(df[,1:36]), FUN = sum)

但这会引发错误,因为这不是名称列表。我也尝试过

but it throws an error since that isn't a list of names. I've also tried

aggregate(df$field, by = list(names(df)[1:36]), FUN = sum)

虽然这没有给出错误,但它给了我一个聚合我的df命名为唯一的观测值。

And while this doesn't give an error, it gives me back an aggregation with my df names as the unique observations.

或者我是否错过了一种简单的方法来表达使用其余数据框来汇总这四列?

Or am I missing an easy way to say "aggregate these four columns using the rest of the data frame?"

谢谢

下面是一个示例数据框:

Here's an example data frame:

  A B C D Sum
1 A B C D   1
2 A B C D   2
3 A B C D   3
4 E F 1 R   4
5 E F 1 R   5

汇总后,我希望它看起来像:

After I aggregate I want it to look like:

  A B C D Sum
1 A B C D 6
2 E F 1 R 9

我知道如果我在聚合语句的 by部分中但在我的实际数据框中明确声明了x $ A,x $ B,x $ C,x $ D,我可以这样做这将需要显式键入大约40个字段名称。

I know I can do this if I explicitly state x$A, x$B, x$C, x$D in the "by" portion of the aggregate statement, but in my actual data frame this would require explicitly typing about 40 field names.

推荐答案

您在问如何汇总多个变量的总和,按剩余变量分组。我将通过首先组合多个变量,然后使用 aggregate 函数的更方便的公式界面进行聚合来做到这一点。例如,考虑基于剩余变量(Petal.Width和Species)在虹膜数据集中聚合Sepal.Length,Sepal.Width和Petal.Length之和:

You are asking how to aggregate the sum of multiple variables, grouped by the remaining variables. I would do this by combining the multiple variables first and then aggregating using the (in my opinion) more convenient formula interface of the aggregate function. For instance, consider aggregating the sum of Sepal.Length, Sepal.Width, and Petal.Length in the iris dataset based on the remaining variables (Petal.Width and Species):

agg <- iris
cols <- c("Sepal.Length", "Sepal.Width", "Petal.Length")
agg$sum <- rowSums(agg[,cols])
agg <- agg[,!names(agg) %in% cols]
aggregate(sum~., data=agg, FUN=sum)
#    Petal.Width    Species   sum
# 1          0.1     setosa  47.8
# 2          0.2     setosa 284.1
# 3          0.3     setosa  68.1
# 4          0.4     setosa  74.6
# 5          0.5     setosa  10.1
# 6          0.6     setosa  10.1
# 7          1.0 versicolor  79.9
# 8          1.1 versicolor  34.3
# 9          1.2 versicolor  63.8
# 10         1.3 versicolor 166.5
# 11         1.4 versicolor  96.7
# 12         1.5 versicolor 136.5
# 13         1.6 versicolor  42.0
# 14         1.7 versicolor  14.7
# 15         1.8 versicolor  13.9
# 16         1.4  virginica  14.3
# 17         1.5  virginica  27.4
# 18         1.6  virginica  16.0
# 19         1.7  virginica  11.9
# 20         1.8  virginica 162.2
# 21         1.9  virginica  71.7
# 22         2.0  virginica  91.3
# 23         2.1  virginica  94.4
# 24         2.2  virginica  48.3
# 25         2.3  virginica 125.6
# 26         2.4  virginica  44.4
# 27         2.5  virginica  48.2

这篇关于R通过大量列进行聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆