无法使aggregate()按组进行回归工作 [英] Can't get aggregate() work for regression by group

查看:160
本文介绍了无法使aggregate()按组进行回归工作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将aggregate与该自定义功能一起使用:

I want to use aggregate with this custom function:

#linear regression f-n
CalculateLinRegrDiff = function (sample){
  fit <- lm(value~ date, data = sample)
  diff(range(fit$fitted))
}

dataset2 = aggregate(value ~ id + col, dataset, CalculateLinRegrDiff(dataset))

我收到错误:

Error in get(as.character(FUN), mode = "function", envir = envir) : 
  object 'FUN' of mode 'function' was not found

怎么了?

推荐答案

首先,您使用aggregate的语法错误.将函数CalculateLinRegrDiff传递为CalculateLinRegrDiff(dataset)FUN自变量.

Your syntax on using aggregate is wrong in the first place. Pass function CalculateLinRegrDiff not an evaluated one CalculateLinRegrDiff(dataset) to FUN argument.

第二,您选择了错误的工具. aggregate无法帮助您按组拟合回归.它根据RHS上的组合在~的LHS上分割向量,然后在LHS上应用FUN.也就是说,FUN应该是与原子向量(而不是数据帧)一起使用的函数.说meansdquantile等都是以原子向量为输入的函数. CalculateLinRegrDiff需要输入数据框,并且不能与aggregate一起使用.

Secondly, you've chosen the wrong tool. aggregate can't help you fit a regression by group. It splits the vector on the LHS of ~ according to combinations on the RHS, and then apply FUN on the LHS. That is, FUN should be a function that works with an atomic vector not a data frame. Say, mean, sd, quantile, etc are all functions that take atomic vector as input. CalculateLinRegrDiff expects a data frame input and that is not going to work with aggregate.

请注意,有时我们在LHS上使用cbind,例如cbind(x, y) ~ f.这意味着我们将FUN并行应用于x ~ fy ~ f. LHS变量是独立的,不能一起使用.

Note that sometimes we use cbind on the LHS, like cbind(x, y) ~ f. This means that we apply FUN in parallel to x ~ f and y ~ f. The LHS variables are independent and not used together.

最适合您的工具是by功能.它将数据帧拆分为子数据帧,并在每个子帧上应用FUN.因此,它是按组进行回归的理想选择.

The right tool for you is the by function. It splits a data frame into sub data frames and applies FUN on each sub frame. So it is ideal for regression by group.

by(dataset[c("value", "date")], dataset[c("id", "col")], CalculateLinRegrDiff)

一个简单的可复制示例:

A simple reproducible example:

set.seed(0)
dataset <- data.frame(value = runif(20), date = runif(20),
                      f = sample(gl(2, 10)), g = sample(gl(4, 5)))
oo <- by(dataset[c("value", "date")], dataset[c("f", "g")], CalculateLinRegrDiff)
str(oo)
# by [1:2, 1:4] 0.307 0.251 0.109 0.201 0.472 ...
# - attr(*, "dimnames")=List of 2
#  ..$ f: chr [1:2] "1" "2"
#  ..$ g: chr [1:4] "1" "2" "3" "4"

由于CalculateLinRegrDiff是返回单个标量的标量函数,因此by会将结果oo简化为数组而不是列表.该数组就像一个列联表,因此我们可以使用as.data.frame的表"方法将其重塑为数据框:

Since CalculateLinRegrDiff is a scalar function that returns a single scalar, by will simplify the result oo to an array rather than a list. This array is like a contingency table, so we can use the "table" method of as.data.frame to reshape it to a data frame:

oo <- as.data.frame.table(oo)
#  f g      Freq
#1 1 1 0.3069877
#2 2 1 0.2508591
#3 1 2 0.1087895
#4 2 2 0.2007295
#5 1 3 0.4715680
#6 2 3 0.4942069
#7 1 4 0.3223174
#8 2 4 0.4687340

名称"Freq"可能是不希望的,但您可以轻松更改它.说names(oo)[3] <- "foo".

The name "Freq" may be undesired but you can easily change it. Say names(oo)[3] <- "foo".

正如我在关于您的问题的评论中所说,我们也可以使用splitlapply.但是,没有简单的方法可以将结果转换为漂亮的数据框.

As I said in my comments under your question, we can also use split and lapply. But then there is no trivial way to convert the result into a good-looking data frame.

datlist <- split(dataset[c("value", "date")], dataset[c("f", "g")], drop = TRUE)
rr <- lapply(datlist, CalculateLinRegrDiff)
stack(rr)
#     values ind
#1 0.3069877 1.1
#2 0.2508591 2.1
#3 0.1087895 1.2
#4 0.2007295 2.2
#5 0.4715680 1.3
#6 0.4942069 2.3
#7 0.3223174 1.4
#8 0.4687340 2.4

我建议您阅读线性回归和R中的分组依据,以获取关于按组回归的详尽演示.

I suggest you read Linear Regression and group by in R for a thorough demonstrations on regression by group.

这篇关于无法使aggregate()按组进行回归工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆