为什么不使用for循环? [英] Why not use a for loop?

查看:203
本文介绍了为什么不使用for循环?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在网上的数据科学家中看到了很多关于不建议for循环的评论.但是,最近我发现自己处于一种有用的状态.我想知道以下流程是否有更好的替代方法(以及为什么替代方法会更好):

I've been seeing a lot of comments among data scientists online about how for loops are not advisable. However, I recently found myself in a situation where using one was helpful. I would like to know if there is a better alternative for the following process (and why the alternative would be better):

我需要运行一系列重复测量的方差分析(ANOVA),并类似于下面看到的可重现示例来处理该问题.

I needed to run a series of repeated-measures ANOVA and approached the problem similarly to the reproducible example you see below.

[我知道关于运行多个ANOVA模型还有其他问题,这些分析还有其他选择,但是现在我只是想听听有关for循环的使用]

[I am aware that there are other issues regarding running multiple ANOVA models and that there are other options for these sorts of analyses, but for now I'd simply like to hear about the use of for loop]

例如,四个重复测量方差分析模型-四个因变量,分别在三种情况下进行了测量:

As an example, four repeated-measures ANOVA models - four dependent variables that were each measured at three occasions:

set.seed(1976)
code <- seq(1:60)
time <- rep(c(0,1,2), each = 20)
DV1 <- c(rnorm(20, 10, 2), rnorm(20, 10, 2), rnorm(20, 14, 2))
DV2 <- c(rnorm(20, 10, 2), rnorm(20, 10, 2), rnorm(20, 10, 2))
DV3 <- c(rnorm(20, 10, 2), rnorm(20, 10, 2), rnorm(20, 8, 2))
DV4 <- c(rnorm(20, 10, 2), rnorm(20, 10, 2), rnorm(20, 10, 2))
dat <- data.frame(code, time, DV1, DV2, DV3, DV4)

outANOVA <- list()

for (i in names(dat)) {
  y <- dat[[i]]
  outANOVA[i] <- summary(aov(y ~ factor(time) + Error(factor(code)), 
                                  data = dat))
}

outANOVA

推荐答案

您可以这样写,它更紧凑:

You could write it this way, it's more compact:

outANOVA <-
  lapply(dat,function(y)
    summary(aov(y ~ factor(time) + Error(factor(code)),data = dat)))

for循环不一定比应用函数慢,但对于许多人来说,它们不那么容易阅读.在某种程度上,这是一个品味问题.

for loops are not necessarily slower than apply functions but they're less easy to read for many people. It is to some extent a matter of taste.

真正的罪行是在向量功能可用时使用for循环.这些向量化函数通常包含用C编写的for循环,速度要快得多(或调用循环).

The real crime is to use a for loop when a vectorized function is available. These vectorized functions usually contain for loops written in C that are much faster (or call functions that do).

请注意,在这种情况下,我们也可以避免创建全局变量y,并且不必初始化列表outANOVA.

Notice that in this case we also could avoid to create a global variable y and that we didn't have to initialize the list outANOVA.

直接从此相关文章中获得的另一点:用于R和计算中的循环速度(由Glen_b回答):

Another point, directly from this relevant post :For loops in R and computational speed (answer by Glen_b):

对于R中的循环,它并不总是比其他方法慢,例如apply -但有一个巨大的bugbear-•绝对不要在循环内增加数组

For loops in R are not always slower than other approaches, like apply - but there's one huge bugbear - •never grow an array inside a loop

相反,请在循环之前使数组变大,然后填充它们 上.

Instead, make your arrays full-size before you loop and then fill them up.

在您的情况下,您正在成长outANOVA,对于大循环,这可能会成为问题.

In your case you're growing outANOVA, for big loops it could become problematic.

在一个简单的示例中,这里有一些microbenchmark不同的方法:

Here is some microbenchmark of different methods on a simple example:

n <- 100000
microbenchmark::microbenchmark(
preallocated_vec  = {x <- vector(length=n); for(i in 1:n) {x[i] <- i^2}},
preallocated_vec2 = {x <- numeric(n); for(i in 1:n) {x[i] <- i^2}},
incremented_vec   = {x <- vector(); for(i in 1:n) {x[i] <- i^2}},
preallocated_list = {x <- vector(mode = "list", length = n); for(i in 1:n) {x[i] <- i^2}},
incremented_list  = {x <- list(); for(i in 1:n) {x[i] <- i^2}},
sapply            = sapply(1:n, function(i) i^2),
lapply            = lapply(1:n, function(i) i^2),
times=20)

# Unit: milliseconds
# expr                     min         lq       mean     median         uq        max neval
# preallocated_vec    9.784237  10.100880  10.686141  10.367717  10.755598  12.839584    20
# preallocated_vec2   9.953877  10.315044  10.979043  10.514266  11.792158  12.789175    20
# incremented_vec    74.511906  79.318298  81.277439  81.640597  83.344403  85.982590    20
# preallocated_list  10.680134  11.197962  12.382082  11.416352  13.528562  18.620355    20
# incremented_list  196.759920 201.418857 212.716685 203.485940 205.441188 393.522857    20
# sapply              6.557739   6.729191   7.244242   7.063643   7.186044   9.098730    20
# lapply              6.019838   6.298750   6.835941   6.571775   6.844650   8.812273    20

这篇关于为什么不使用for循环?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆