动态将数据框列传递给“聚合”时如何保留列名 [英] How to preserve column names when dynamically passing data frame columns to `aggregate`
问题描述
具有如下数据框
df1 <- data.frame(a=seq(1.1,9.9,1.1), b=seq(0.1,0.9,0.1),
c=rev(seq(10.1, 99.9, 11.1)))
我想通过汇总列
b
和 c
a
I want to aggregate cols b
and c
by a
所以我会做这样的事情
aggregate(cbind(b,c) ~ a, data = df1, mean)
这样就可以完成。但是我想不使用函数中的硬编码列名来概括。
This would get it done. However I want to generalize without hard coded column names like in a function.
myAggFunction <- function (df, col_main, col_1, col_2){
return (aggregate(cbind(df[,col1], df[,col2]) ~ df[,col_main], df, mean))
}
myAggFunction(df, 1, 2, 3)
我遇到的问题是返回的数据框的列名如下所示
The issue I have is that the col names of the returned data frame is as below
df2[, 1] V1 V2
如何在返回的数据框中获取原始数据框中的列名?
How do I get the column names in the original data frame in the returned data frame?
推荐答案
我将假设一个一般情况,您有多个LHS(左侧)和多个RHS(右侧)。
I will be assuming a general case, where you have multiple LHS (left hand sides) as well as multiple RHS (right hand sides).
使用数据框方法
## S3 method for class 'data.frame'
aggregate(x, by, FUN, ..., simplify = TRUE, drop = TRUE)
如果将对象作为主题列表,您将保留名称。因此,不要使用 [,]
来访问数据框,而要使用 []
来访问数据框。您可以将函数构造为:
If you pass object as a named list, you get names preserved. So do not access your data frame with [, ]
, but with []
. You may construct your function as:
## `LHS` and `RHS` are vectors of column names or numbers giving column positions
fun1 <- function (df, LHS, RHS){
## call `aggregate.data.frame`
aggregate.data.frame(df[LHS], df[RHS], mean)
}
仍然使用公式方法吗?
## S3 method for class 'formula'
aggregate(formula, data, FUN, ...,
subset, na.action = na.omit)
这有点乏味,但是我们想要通过以下方式构造一个不错的公式:
It is slightly tedious, but we want to construct a nice formula via:
as.formula( paste(paste0("cbind(", toString(LHS), ")"),
paste(RHS, collapse = " + "), sep = " ~ ") )
例如:
LHS <- c("y1", "y2", "y3")
RHS <- c("x1", "x2")
as.formula( paste(paste0("cbind(", toString(LHS), ")"),
paste(RHS, collapse = " + "), sep = "~") )
# cbind(y1, y2, y3) ~ x1 + x2
如果将此公式提供给汇总
,您将得到保留的体面的列名。
If you feed this formula to aggregate
, you will get decent column names preserved.
因此,应这样构造函数:
So construct your function as such:
fun2 <- function (df, LHS, RHS){
## ideally, `LHS` and `RHS` should readily be vector of column names
## but specifying vector of numeric positions are allowed
if (is.numeric(LHS)) LHS <- names(df)[LHS]
if (is.numeric(RHS)) RHS <- names(df)[RHS]
## make a formula
form <- as.formula( paste(paste0("cbind(", toString(LHS), ")"),
paste(RHS, collapse = " + "), sep = "~") )
## call `aggregate.formula`
stats:::aggregate.formula(form, df, mean)
}
备注
aggregate.data .frame
是最好的。 aggregate.formula
是一个包装器,将在内部调用 model.frame
首先构建一个数据帧。
aggregate.data.frame
is the best. aggregate.formula
is a wrapper and will call model.frame
inside to construct a data frame first.
我可以选择公式方法,因为构造公式的方式对于 lm
等有用。
I give "formula" method as an option, because the way I construct a formula is useful for lm
, etc.
简单,可重现的示例
set.seed(0)
dat <- data.frame(y1 = rnorm(10), y2 = rnorm(10),
x1 = gl(2,5, labels = letters[1:2]))
## "data.frame" method with `fun1`
fun1(dat, 1:2, 3)
# x1 y1 y2
#1 a 0.79071819 -0.3543499
#2 b -0.07287026 -0.3706127
## "formula" method with `fun2`
fun2(dat, 1:2, 3)
# x1 y1 y2
#1 a 0.79071819 -0.3543499
#2 b -0.07287026 -0.3706127
fun2(dat, c("y1", "y2"), "x1")
# x1 y1 y2
#1 a 0.79071819 -0.3543499
#2 b -0.07287026 -0.3706127
这篇关于动态将数据框列传递给“聚合”时如何保留列名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!