列数 [英] R ave by columns
问题描述
我想在数据框中的许多列(十个)上使用 ave
函数:
I want to use the ave
function on many columns (tens) on the data frame:
ave(df[,the_cols], df[,c('site', 'month')], FUN = mean)
问题是 ave
在平均值
函数上运行所有 the_cols
列在一起。有没有办法分别对每个 the_cols
列运行它?
The problem is that ave
runs the mean
function on all the_cols
columns together. Is there any way to run it for each of the_cols
columns separately?
我试图查看其他功能。 tapply
和 aggregate
不同,它们每组仅返回一行。我需要 ave
行为,即返回与原始 df
相同的行数。还有一个 by
函数,但是使用它非常笨拙,因为它返回了一个复杂的列表结构,必须以某种方式进行转换。
I tried to look at the other functions. tapply
and aggregate
are different, they return only one row per group. I need the ave
behaviour, i.e. to return the same number of rows as given in the original df
. There is also a by
function, but using it would be very clumsy as it returns a complicated list structure that would have to be converted somehow.
肯定存在许多笨拙和丑陋的解决方案(通过& do.call,多个* apply函数调用等),但是真的有一些简单而优雅的方法吗?
Certainly many clumsy and ugly (by & do.call, multiple *apply function calls etc.) solutions exist but is there some really easy and elegant?
推荐答案
也许我遗漏了一些东西,但是这里的 apply()
方法会很好地工作,并且不会丑陋或需要任何丑陋的骇客。一些伪数据:
Perhaps I'm missing something, but an apply()
approach here would work very well and wouldn't be ugly or require any ugly hacks. Some dummy data:
df <- data.frame(A = rnorm(20), B = rnorm(20), site = gl(5,4), month = gl(10, 2))
出了什么问题
sapply(df[, c("A","B")], ave, df$site, df$month)
?如果确实要通过 data.frame()
将其强制为数据框。
? Coerce that to a data frame via data.frame()
if you really want that.
R> sapply(df[, c("A","B")], ave, df$site, df$month)
A B
[1,] 0.0775 0.04845
[2,] 0.0775 0.04845
[3,] -1.5563 0.43443
[4,] -1.5563 0.43443
[5,] 0.7193 0.01151
[6,] 0.7193 0.01151
[7,] -0.9243 -0.28483
[8,] -0.9243 -0.28483
[9,] 0.3316 0.14473
[10,] 0.3316 0.14473
[11,] -0.2539 0.20384
[12,] -0.2539 0.20384
[13,] 0.5558 -0.37239
[14,] 0.5558 -0.37239
[15,] 0.1976 -0.22693
[16,] 0.1976 -0.22693
[17,] 0.2031 1.11041
[18,] 0.2031 1.11041
[19,] 0.3229 -0.53818
[20,] 0.3229 -0.53818
将其放在一起,怎么样
AVE <- function(df, cols, ...) {
dots <- list(...)
out <- sapply(df[, cols], ave, ...)
out <- data.frame(as.data.frame(dots), out)
names(out) <- c(paste0("Fac", seq_along(dots)), cols)
out
}
R> AVE(df, c("A","B"), df$site, df$month)
Fac1 Fac2 A B
1 1 1 0.0775 0.04845
2 1 1 0.0775 0.04845
3 1 2 -1.5563 0.43443
4 1 2 -1.5563 0.43443
5 2 3 0.7193 0.01151
6 2 3 0.7193 0.01151
7 2 4 -0.9243 -0.28483
8 2 4 -0.9243 -0.28483
9 3 5 0.3316 0.14473
10 3 5 0.3316 0.14473
11 3 6 -0.2539 0.20384
12 3 6 -0.2539 0.20384
13 4 7 0.5558 -0.37239
14 4 7 0.5558 -0.37239
15 4 8 0.1976 -0.22693
16 4 8 0.1976 -0.22693
17 5 9 0.2031 1.11041
18 5 9 0.2031 1.11041
19 5 10 0.3229 -0.53818
20 5 10 0.3229 -0.53818
与 ...
一起工作的细节此刻让我迷惑了,但您应该能够为 Fac1取个更好的名字
等等。
The details of working with ...
escape me at the moment, but you should be able to get better names for the Fac1
etc that I used here.
我会扔一个替代表示形式: aggregate()
,但使用 ave()
函数而不是 mean()
:
I'll throw an alternative representation out there for you: aggregate()
but use the ave()
function instead of mean()
:
R> aggregate(cbind(A, B) ~ site + month, data = df, ave)
site month A.1 A.2 B.1 B.2
1 1 1 0.0775 0.0775 0.04845 0.04845
2 1 2 -1.5563 -1.5563 0.43443 0.43443
3 2 3 0.7193 0.7193 0.01151 0.01151
4 2 4 -0.9243 -0.9243 -0.28483 -0.28483
5 3 5 0.3316 0.3316 0.14473 0.14473
6 3 6 -0.2539 -0.2539 0.20384 0.20384
7 4 7 0.5558 0.5558 -0.37239 -0.37239
8 4 8 0.1976 0.1976 -0.22693 -0.22693
9 5 9 0.2031 0.2031 1.11041 1.11041
10 5 10 0.3229 0.3229 -0.53818 -0.53818
请注意相当清楚的输出,但这是可以根据需要轻松地重塑形状。
Note quite the stated output, but it is something that is simple to reshape if needed.
这篇关于列数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!