一次汇总多列 [英] Aggregate multiple columns at once
问题描述
我有一个类似的数据帧:
I have a data-frame likeso:
x <-
id1 id2 val1 val2 val3 val4
1 a x 1 9
2 a x 2 4
3 a y 3 5
4 a y 4 9
5 b x 1 7
6 b y 4 4
7 b x 3 9
8 b y 2 8
我希望通过id1汇总以上内容& id2。我希望能够同时获取val1,val2,val3,val4的平均值。
I wish to aggregate the above by id1 & id2. I want to be able to get the means for val1, val2, val3, val4 at the same time.
我该怎么做?
这是我目前拥有的,但仅适用于1列:
This is what i currently have but it works just for 1 column:
agg <- aggregate(x$val1, list(id11 = x$id1, id2= x$id2), mean)
names(agg)[3] <- c("val1") # Rename the column
此外,我该如何重命名在上述相同语句中作为平均值输出的列
Also, how do i rename the columns which are outputted as means in the same statement given above
推荐答案
我们可以使用 aggregate
的公式方法。 〜
的'rhs'上的变量是分组变量,而的变量。
代表'df1上的所有其他变量。 '(从示例中,我们假设我们需要除分组以外的所有列的平均值
),指定数据集和函数( mean
)。
We can use the formula method of aggregate
. The variables on the 'rhs' of ~
are the grouping variables while the .
represents all other variables in the 'df1' (from the example, we assume that we need the mean
for all the columns except the grouping), specify the dataset and the function (mean
).
aggregate(.~id1+id2, df1, mean)
或者我们可以使用 summarise_each
从 dplyr
分组后( group_by
)
library(dplyr)
df1 %>%
group_by(id1, id2) %>%
summarise_each(funs(mean))
或者将 summerise
与一起使用
( dplyr
开发版本-'0.8.99.9000'
)
Or using summarise
with across
(dplyr
devel version - ‘0.8.99.9000’
)
df1 %>%
group_by(id1, id2) %>%
summarise(across(starts_with('val'), mean))
或其他选择s data.table
。我们将'data.frame'转换为'data.table'( setDT(df1)
,按'id1'和'id2'分组,我们遍历数据子集.table( .SD
)并获得平均值
。
Or another option is data.table
. We convert the 'data.frame' to 'data.table' (setDT(df1)
, grouped by 'id1' and 'id2', we loop through the subset of data.table (.SD
) and get the mean
.
library(data.table)
setDT(df1)[, lapply(.SD, mean), by = .(id1, id2)]
数据
data
df1 <- structure(list(id1 = c("a", "a", "a", "a", "b", "b",
"b", "b"
), id2 = c("x", "x", "y", "y", "x", "y", "x", "y"),
val1 = c(1L,
2L, 3L, 4L, 1L, 4L, 3L, 2L), val2 = c(9L, 4L, 5L, 9L, 7L, 4L,
9L, 8L)), .Names = c("id1", "id2", "val1", "val2"),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8"))
这篇关于一次汇总多列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!