For 循环永远 - 可以申请吗? [英] For loop taking forever - possible to apply?

查看:42
本文介绍了For 循环永远 - 可以申请吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个非常大的data.frame.我想要做的是从这些列中减去第 37-2574 列的行平均值,然后除以行标准差.然后我需要将第 1-18 列乘以(同​​一行)标准偏差.最后,我需要从第 19-36 列的第 18-2574 列中减去行平均值.我目前正在尝试通过 for 循环来做到这一点,但它需要永远.有没有办法用 apply 甚至更快的 for 循环来做到这一点?这是我目前拥有的:

I have a very large data.frame. What I am trying to do is subtract the row mean of columns 37-2574 from those columns, then divide by the row standard deviation. I then need to multiply columns 1-18 by the (same row) standard deviation. Finally, I need to subtract the row mean from columns 18-2574 from columns 19-36. I'm currently trying to do this via a for loop, but it is taking forever. Is there a way to do this with apply, or even a faster for loop? Here's what I have currently:

for (i in 1:nrow(samples)){
  theta.mean <- mean(samples[i, 37:2574])
  theta.sd <- sd(samples[i, 37:2574])
  samples[i, 37:2574] <- (samples[i, 37:2574] - theta.mean)/ theta.sd
  # then multiply columns 1-18 by SD of theta at each iteration 
  samples[i, 1:18] <- samples[i, 1:18] * theta.sd
  # subtract theta-mean * column 1-18 from columns 19-36
  for (j in 1:18){
    theta.mean.beta <- theta.mean * samples[i, j]
    samples[i, j + 18] <- samples[i, j + 18] - theta.mean.beta
  }
}

推荐答案

诀窍是使用 apply() 一次计算所有行统计信息,然后按列进行操作像这样:

The trick is to use apply() to calculate all the row statistics all at once and then to do the operations column-wise like like so:

# calculate the row means and sds's using apply()
theta.means  <-  apply(samples[,37:2574],  # the object to summarized
                       1,                  # summarize over the rows (MARGIN = 1)
                       mean)               # the summary function 
theta.sds  <-  apply(samples[,37:2574],1,sd)

# define a function to apply for each row
standardize  <-  function(x)
    (x - mean(x))/sd(x)
# apply it it over for each row (MARGIN = 1)
samples[,37:2574]  <-  t(apply(samples[,37:2574],1,standardize))

# subtract theta-mean * column 1-18 from columns 19-36
for (j in 1:18){
    samples[, j] <- samples[,j] * theta.sds
    theta.mean.beta <- theta.means * samples[, j]
    samples[, j + 18] <- samples[, j + 18] - theta.mean.beta
}

通过获取行的子集(例如'samples <- samples[1:100,]`)并检查结果是否相同(我会我自己已经这样做了,但是没有发布示例数据集......).

Be sure and double check that this code is equivalent to your original code by taking a subset of rows (e.g. 'samples <- samples[1:100,]`) and checking that the results are the same (I would have done this my self, but there wasn't an example dataset posted...).

更新:

以下是基于 David Arenburg 评论的更有效的实现:

Here's a more efficient implementation based on David Arenburg's comments below:

# calculate the row means via rowMeans()
theta.means  <-  rowMeans(as.matrix(samples[,37:2574]))

# redefine SD to be vectorized with respect to rows in the data.frame 
rowSD <- function(x)  
    sqrt(rowSums((x - rowMeans(x))^2)/(dim(x)[2] - 1)) 

# calculate the row means and sds's using the vectorized version of SD
theta.sds  <-  rowSD(as.matrix(samples[,37:2574]))

现在使用从 data.frame (df) 中减去向量 (x) 时的事实,R 回收 x 的值——当 lengh(x) == nrow(df) 结果与df的每一列减去x是一样的:

Now use the fact when you subtract a vector (x) from a data.frame (df), R recycles the values of x -- and when lengh(x) == nrow(df) the result is the same as subtracting x from each column of df:

 # standardize columns 37 through 2574
 samples[,37:2574] <-  (samples[,37:2574] - theta.means)/theta.sds

现在对 1:1819:36

# subtract theta-mean * column 1-18 from columns 19-36
samples[, 1:18] <- samples[,1:18] * theta.sds
samples[, 1:18 + 18] <- samples[, 1:18 + 18] - theta.means * samples[,1:18] * theta.sds

这篇关于For 循环永远 - 可以申请吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆