为什么不使用for循环? [英] Why not use a for loop?
问题描述
我在网上的数据科学家中看到了很多关于不建议for循环的评论.但是,最近我发现自己处于一种有用的状态.我想知道以下流程是否有更好的替代方法(以及为什么替代方法会更好):
I've been seeing a lot of comments among data scientists online about how for loops are not advisable. However, I recently found myself in a situation where using one was helpful. I would like to know if there is a better alternative for the following process (and why the alternative would be better):
我需要运行一系列重复测量的方差分析(ANOVA),并类似于下面看到的可重现示例来处理该问题.
I needed to run a series of repeated-measures ANOVA and approached the problem similarly to the reproducible example you see below.
[我知道关于运行多个ANOVA模型还有其他问题,这些分析还有其他选择,但是现在我只是想听听有关for循环的使用]
[I am aware that there are other issues regarding running multiple ANOVA models and that there are other options for these sorts of analyses, but for now I'd simply like to hear about the use of for loop]
例如,四个重复测量方差分析模型-四个因变量,分别在三种情况下进行了测量:
As an example, four repeated-measures ANOVA models - four dependent variables that were each measured at three occasions:
set.seed(1976)
code <- seq(1:60)
time <- rep(c(0,1,2), each = 20)
DV1 <- c(rnorm(20, 10, 2), rnorm(20, 10, 2), rnorm(20, 14, 2))
DV2 <- c(rnorm(20, 10, 2), rnorm(20, 10, 2), rnorm(20, 10, 2))
DV3 <- c(rnorm(20, 10, 2), rnorm(20, 10, 2), rnorm(20, 8, 2))
DV4 <- c(rnorm(20, 10, 2), rnorm(20, 10, 2), rnorm(20, 10, 2))
dat <- data.frame(code, time, DV1, DV2, DV3, DV4)
outANOVA <- list()
for (i in names(dat)) {
y <- dat[[i]]
outANOVA[i] <- summary(aov(y ~ factor(time) + Error(factor(code)),
data = dat))
}
outANOVA
推荐答案
您可以这样写,它更紧凑:
You could write it this way, it's more compact:
outANOVA <-
lapply(dat,function(y)
summary(aov(y ~ factor(time) + Error(factor(code)),data = dat)))
for
循环不一定比应用函数慢,但对于许多人来说,它们不那么容易阅读.在某种程度上,这是一个品味问题.
for
loops are not necessarily slower than apply functions but they're less easy to read for many people. It is to some extent a matter of taste.
真正的罪行是在向量功能可用时使用for
循环.这些向量化函数通常包含用C编写的for循环,速度要快得多(或调用循环).
The real crime is to use a for
loop when a vectorized function is available. These vectorized functions usually contain for loops written in C that are much faster (or call functions that do).
请注意,在这种情况下,我们也可以避免创建全局变量y
,并且不必初始化列表outANOVA
.
Notice that in this case we also could avoid to create a global variable y
and that we didn't have to initialize the list outANOVA
.
直接从此相关文章中获得的另一点:用于R和计算中的循环速度(由Glen_b回答):
Another point, directly from this relevant post :For loops in R and computational speed (answer by Glen_b):
对于R中的循环,它并不总是比其他方法慢,例如apply -但有一个巨大的bugbear-•绝对不要在循环内增加数组
For loops in R are not always slower than other approaches, like apply - but there's one huge bugbear - •never grow an array inside a loop
相反,请在循环之前使数组变大,然后填充它们 上.
Instead, make your arrays full-size before you loop and then fill them up.
在您的情况下,您正在成长outANOVA
,对于大循环,这可能会成为问题.
In your case you're growing outANOVA
, for big loops it could become problematic.
在一个简单的示例中,这里有一些microbenchmark
不同的方法:
Here is some microbenchmark
of different methods on a simple example:
n <- 100000
microbenchmark::microbenchmark(
preallocated_vec = {x <- vector(length=n); for(i in 1:n) {x[i] <- i^2}},
preallocated_vec2 = {x <- numeric(n); for(i in 1:n) {x[i] <- i^2}},
incremented_vec = {x <- vector(); for(i in 1:n) {x[i] <- i^2}},
preallocated_list = {x <- vector(mode = "list", length = n); for(i in 1:n) {x[i] <- i^2}},
incremented_list = {x <- list(); for(i in 1:n) {x[i] <- i^2}},
sapply = sapply(1:n, function(i) i^2),
lapply = lapply(1:n, function(i) i^2),
times=20)
# Unit: milliseconds
# expr min lq mean median uq max neval
# preallocated_vec 9.784237 10.100880 10.686141 10.367717 10.755598 12.839584 20
# preallocated_vec2 9.953877 10.315044 10.979043 10.514266 11.792158 12.789175 20
# incremented_vec 74.511906 79.318298 81.277439 81.640597 83.344403 85.982590 20
# preallocated_list 10.680134 11.197962 12.382082 11.416352 13.528562 18.620355 20
# incremented_list 196.759920 201.418857 212.716685 203.485940 205.441188 393.522857 20
# sapply 6.557739 6.729191 7.244242 7.063643 7.186044 9.098730 20
# lapply 6.019838 6.298750 6.835941 6.571775 6.844650 8.812273 20
这篇关于为什么不使用for循环?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!