一次汇总多列 [英] Aggregate multiple columns at once

查看:47
本文介绍了一次汇总多列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个类似的数据帧:

I have a data-frame likeso:

x <-
id1 id2    val1  val2 val3 val4
1   a   x    1    9
2   a   x    2    4
3   a   y    3    5
4   a   y    4    9
5   b   x    1    7
6   b   y    4    4
7   b   x    3    9
8   b   y    2    8

我希望通过id1汇总以上内容& id2。我希望能够同时获取val1,val2,val3,val4的平均值。

I wish to aggregate the above by id1 & id2. I want to be able to get the means for val1, val2, val3, val4 at the same time.

我该怎么做?

这是我目前拥有的,但仅适用于1列:

This is what i currently have but it works just for 1 column:

agg <- aggregate(x$val1, list(id11 = x$id1, id2= x$id2), mean)
names(agg)[3] <- c("val1")  # Rename the column

此外,我该如何重命名在上述相同语句中作为平均值输出的列

Also, how do i rename the columns which are outputted as means in the same statement given above

推荐答案

我们可以使用 aggregate 的公式方法。 的'rhs'上的变量是分组变量,而的变量。代表'df1上的所有其他变量。 '(从示例中,我们假设我们需要除分组以外的所有列的平均值),指定数据集和函数( mean )。

We can use the formula method of aggregate. The variables on the 'rhs' of ~ are the grouping variables while the . represents all other variables in the 'df1' (from the example, we assume that we need the mean for all the columns except the grouping), specify the dataset and the function (mean).

aggregate(.~id1+id2, df1, mean)






或者我们可以使用 summarise_each dplyr 分组后( group_by

library(dplyr)
df1 %>%
    group_by(id1, id2) %>% 
    summarise_each(funs(mean))

或者将 summerise 一起使用 dplyr 开发版本-'0.8.99.9000'

Or using summarise with across (dplyr devel version - ‘0.8.99.9000’)

df1 %>% 
    group_by(id1, id2) %>%
    summarise(across(starts_with('val'), mean))






或其他选择s data.table 。我们将'data.frame'转换为'data.table'( setDT(df1),按'id1'和'id2'分组,我们遍历数据子集.table( .SD )并获得平均值


Or another option is data.table. We convert the 'data.frame' to 'data.table' (setDT(df1), grouped by 'id1' and 'id2', we loop through the subset of data.table (.SD) and get the mean.

library(data.table)
setDT(df1)[, lapply(.SD, mean), by = .(id1, id2)] 



数据



data

df1 <- structure(list(id1 = c("a", "a", "a", "a", "b", "b", 
"b", "b"
), id2 = c("x", "x", "y", "y", "x", "y", "x", "y"), 
val1 = c(1L, 
2L, 3L, 4L, 1L, 4L, 3L, 2L), val2 = c(9L, 4L, 5L, 9L, 7L, 4L, 
9L, 8L)), .Names = c("id1", "id2", "val1", "val2"), 
class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8"))

这篇关于一次汇总多列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆