根据一列汇总具有多列的数据帧 [英] aggregate data frame with many columns according to one column
问题描述
从许多列的数据框中,我想通过一个列聚合(即 sum
)数百个列,而不指定每个列的名称。 / p>
一些示例数据:
名称<-floor(runif( 20,1,5))
样本<-cbind(名称)
for(i in 1:20){
col <-rnorm(20,2, 4)
样本<-cbind(sample,col)
}
到目前为止,我所拥有的是以下代码,但它使我知道参数的长度必须相同。
聚合了<- aggregate.data.frame(sample [,c(2:20)],by = as.list(names),FUN ='sum')
原始数据集要大得多,因此我无法指定要聚合的每个列的名称,也无法使用列表功能。
您实际上根本不需要列出它们:
aggregate(。〜名称,样本,总和)#。代表所有其他列
当然我最喜欢R为基数,但是如果有人想要 dplyr
:
库(dplyr)
data.frame(sample)%> ;%
group_by(names)%>%
summarise_each(funs(sum))
From a data frame of many columns, I would like to aggregate (i.e. sum
) hundreds of columns by a single column, without specifying each of the column names.
Some sample data:
names <- floor(runif(20, 1, 5))
sample <- cbind(names)
for(i in 1:20){
col <- rnorm(20,2,4)
sample <- cbind(sample, col)
}
What I have until now is the following code, but it gives me that arguments must be the same length.
aggregated <- aggregate.data.frame(sample[,c(2:20)], by = as.list(names), FUN = 'sum')
Original dataset is a lot bigger, so I can't specify the name of each of the columns to be aggregated and I can't use the list function.
You don't actually need to list them at all:
aggregate(. ~ names, sample, sum) # . represents all other columns
Of course base R is my favorite but in case someone wants dplyr
:
library(dplyr)
data.frame(sample) %>%
group_by(names) %>%
summarise_each(funs(sum))
这篇关于根据一列汇总具有多列的数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!