无法使aggregate()按组进行回归工作 [英] Can't get aggregate() work for regression by group
问题描述
我想将aggregate
与该自定义功能一起使用:
I want to use aggregate
with this custom function:
#linear regression f-n
CalculateLinRegrDiff = function (sample){
fit <- lm(value~ date, data = sample)
diff(range(fit$fitted))
}
dataset2 = aggregate(value ~ id + col, dataset, CalculateLinRegrDiff(dataset))
我收到错误:
Error in get(as.character(FUN), mode = "function", envir = envir) :
object 'FUN' of mode 'function' was not found
怎么了?
推荐答案
首先,您使用aggregate
的语法错误.将函数CalculateLinRegrDiff
传递为CalculateLinRegrDiff(dataset)
到FUN
自变量.
Your syntax on using aggregate
is wrong in the first place. Pass function CalculateLinRegrDiff
not an evaluated one CalculateLinRegrDiff(dataset)
to FUN
argument.
第二,您选择了错误的工具. aggregate
无法帮助您按组拟合回归.它根据RHS上的组合在~
的LHS上分割向量,然后在LHS上应用FUN
.也就是说,FUN
应该是与原子向量(而不是数据帧)一起使用的函数.说mean
,sd
,quantile
等都是以原子向量为输入的函数. CalculateLinRegrDiff
需要输入数据框,并且不能与aggregate
一起使用.
Secondly, you've chosen the wrong tool. aggregate
can't help you fit a regression by group. It splits the vector on the LHS of ~
according to combinations on the RHS, and then apply FUN
on the LHS. That is, FUN
should be a function that works with an atomic vector not a data frame. Say, mean
, sd
, quantile
, etc are all functions that take atomic vector as input. CalculateLinRegrDiff
expects a data frame input and that is not going to work with aggregate
.
请注意,有时我们在LHS上使用cbind
,例如cbind(x, y) ~ f
.这意味着我们将FUN
并行应用于x ~ f
和y ~ f
. LHS变量是独立的,不能一起使用.
Note that sometimes we use cbind
on the LHS, like cbind(x, y) ~ f
. This means that we apply FUN
in parallel to x ~ f
and y ~ f
. The LHS variables are independent and not used together.
最适合您的工具是by
功能.它将数据帧拆分为子数据帧,并在每个子帧上应用FUN
.因此,它是按组进行回归的理想选择.
The right tool for you is the by
function. It splits a data frame into sub data frames and applies FUN
on each sub frame. So it is ideal for regression by group.
by(dataset[c("value", "date")], dataset[c("id", "col")], CalculateLinRegrDiff)
一个简单的可复制示例:
A simple reproducible example:
set.seed(0)
dataset <- data.frame(value = runif(20), date = runif(20),
f = sample(gl(2, 10)), g = sample(gl(4, 5)))
oo <- by(dataset[c("value", "date")], dataset[c("f", "g")], CalculateLinRegrDiff)
str(oo)
# by [1:2, 1:4] 0.307 0.251 0.109 0.201 0.472 ...
# - attr(*, "dimnames")=List of 2
# ..$ f: chr [1:2] "1" "2"
# ..$ g: chr [1:4] "1" "2" "3" "4"
由于CalculateLinRegrDiff
是返回单个标量的标量函数,因此by
会将结果oo
简化为数组而不是列表.该数组就像一个列联表,因此我们可以使用as.data.frame
的表"方法将其重塑为数据框:
Since CalculateLinRegrDiff
is a scalar function that returns a single scalar, by
will simplify the result oo
to an array rather than a list. This array is like a contingency table, so we can use the "table" method of as.data.frame
to reshape it to a data frame:
oo <- as.data.frame.table(oo)
# f g Freq
#1 1 1 0.3069877
#2 2 1 0.2508591
#3 1 2 0.1087895
#4 2 2 0.2007295
#5 1 3 0.4715680
#6 2 3 0.4942069
#7 1 4 0.3223174
#8 2 4 0.4687340
名称"Freq"可能是不希望的,但您可以轻松更改它.说names(oo)[3] <- "foo"
.
The name "Freq" may be undesired but you can easily change it. Say names(oo)[3] <- "foo"
.
正如我在关于您的问题的评论中所说,我们也可以使用split
和lapply
.但是,没有简单的方法可以将结果转换为漂亮的数据框.
As I said in my comments under your question, we can also use split
and lapply
. But then there is no trivial way to convert the result into a good-looking data frame.
datlist <- split(dataset[c("value", "date")], dataset[c("f", "g")], drop = TRUE)
rr <- lapply(datlist, CalculateLinRegrDiff)
stack(rr)
# values ind
#1 0.3069877 1.1
#2 0.2508591 2.1
#3 0.1087895 1.2
#4 0.2007295 2.2
#5 0.4715680 1.3
#6 0.4942069 2.3
#7 0.3223174 1.4
#8 0.4687340 2.4
我建议您阅读线性回归和R中的分组依据,以获取关于按组回归的详尽演示.
I suggest you read Linear Regression and group by in R for a thorough demonstrations on regression by group.
这篇关于无法使aggregate()按组进行回归工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!