如何在R中矢量化for循环 [英] How to vectorize a for loop in R

查看：168 发布时间：2018/1/28 13:47:03 r for-loop vectorization

本文介绍了如何在R中矢量化for循环的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图清理这个代码，并想知道是否有人有任何建议，如何在没有循环运行在R。我有一个数据集称为数据与100个变量和20万观测。我想要做的就是扩大数据集，将每个观察值乘以特定的标量，然后将这些数据组合在一起。最后，我需要一个包含80万个观察值（我有四个类别创建）和101个变量的数据集。这是我写的一个循环，但是效率很低，我希望更快，更高效。

  datanew < -  c（）
 for（i in 1:51）{
 for （in 1：4）{
 for（m in 1：4）{
 
 sub < -  subset（data，data $ var1 == i& data $ var2 == k ）
 
 sub [，4：（ncol（sub）-1）] < -  filingstat0711 [i，k，m] * sub [，4：（ncol（sub）-1）] 
 
 sub $ newvar < -  m 
 
 datanew < -  rbind（datanew，sub）
 
 
 
 }

请让我知道您的想法，并感谢您的帮助。

下面是一些带有2K观察值的样本数据，而不是200K

  ＃SAMPLE DATA 
＃------------------------------------------ （矩阵（100 * 20e2），ncol = 20e2，nrow = 100））
 var1 <-c（sapply （seq（41），function（x）sample（1:51）））[1：20e2] 
 var2 <-c（sapply（seq（2 + 20e2 / 6） 1：6）））[1：20e2] 
＃---------------------------------- ＃
 mydf < -  cbind（var1，var2，round（mydf [3：100] * 2.5，2））
 filingstat0711<  -  array（rnorm（51 * 6 * 4）* 1.5 + abs（rnorm（2）* 10）），dim = c（51,6,4））
＃--------------------- ---------------------------＃

解决方案

您可以尝试以下操作。请注意，我们用调用 mapply 替换了前两个for循环，第三个for循环调用了lapply。
另外，我们正在创建两个向量，我们将结合使用向量化乘法。

 ＃使用`expand.grid`创建ik索引组合的表
 ixk < -  expand.grid （i = 1：51，k = 1：6）
 
＃看看expand.grid是什么
头（ixk，60）
 
 $ （c（0,1），times = c（4，ncol（mydf）-4-1）），0（b，b）生成两个向量，用于乘以我们的数据帧子集
 multpVec < ）
 invVec<  - ！multpVec 
 
＃如何使用向量
（multpVec * filingstat0711 [1,2,1] + invVec）
 
 
＃而不是for循环，我们可以使用mapply。 
 newdf<  -  
 mapply（function（i，k）
 
＃你正在使用的函数是：
＃通过匹配var1& var2 
＃然后乘以filingstat中的一个值
来进行子集化的数据帧do.call（rbind，
＃遍历m 
 lapply（1 ：4，函数（m）
 
＃cbind是用于添加newvar = m，在子表的末尾
 cbind（
 
＃）我们转置两次：首先将子集与我们的向量相乘
＃然后返回结果得到原始形式
t（subset（mydf，var1 == i& mydf $ var2 == k））* 
（multpVec * filingstat0711 [i，k，m] + invVec）），
 
＃这是一个参数给cbind 
newvar= m）
） ），
 
＃你传递的两个列表作为参数是展开网格的列
 ixk $ i，ixk $ k，SIMPLIFY = FALSE 
）
 
＃f latten数据帧
 newdf<  -  do.call（rbind，newdf）

< （1）尽量不要使用数据，表， df ，子等等常用函数
在上面的代码中，我用 mydf 来代替 data 。

（2）您可以使用 apply（ixk，1，fu ..）来代替 mapply 我用过，但我认为在这种情况下，使得代码变得更简洁

祝你好运，欢迎来到SO
I'm trying to clean this code up and was wondering if anybody has any suggestions on how to run this in R without a loop. I have a dataset called data with 100 variables and 200,000 observations. What I want to do is essentially expand the dataset by multiplying each observation by a specific scalar and then combine the data together. In the end, I need a data set with 800,000 observations (I have four categories to create) and 101 variables. Here's a loop that I wrote that does this, but it is very inefficient and I'd like something quicker and more efficient.
datanew <- c() for (i in 1:51){ for (k in 1:6){ for (m in 1:4){ sub <- subset(data,data$var1==i & data$var2==k) sub[,4:(ncol(sub)-1)] <- filingstat0711[i,k,m]*sub[,4:(ncol(sub)-1)] sub$newvar <- m datanew <- rbind(datanew,sub) } } }
Please let me know what you think and thanks for the help.

Below is some sample data with 2K observations instead of 200K
# SAMPLE DATA #------------------------------------------------# mydf <- as.data.frame(matrix(rnorm(100 * 20e2), ncol=20e2, nrow=100)) var1 <- c(sapply(seq(41), function(x) sample(1:51)))[1:20e2] var2 <- c(sapply(seq(2 + 20e2/6), function(x) sample(1:6)))[1:20e2] #----------------------------------# mydf <- cbind(var1, var2, round(mydf[3:100]*2.5, 2)) filingstat0711 <- array(round(rnorm(51*6*4)*1.5 + abs(rnorm(2)*10)), dim=c(51,6,4)) #------------------------------------------------#

解决方案
You can try the following. Notice that we replaced the first two for loops with a call to mapply and the third for loop with a call to lapply. Also, we are creating two vectors that we will combine for vectorized multiplication.
# create a table of the i-k index combinations using `expand.grid` ixk <- expand.grid(i=1:51, k=1:6) # Take a look at what expand.grid does head(ixk, 60) # create two vectors for multiplying against our dataframe subset multpVec <- c(rep(c(0, 1), times=c(4, ncol(mydf)-4-1)), 0) invVec <- !multpVec # example of how we will use the vectors (multpVec * filingstat0711[1, 2, 1] + invVec) # Instead of for loops, we can use mapply. newdf <- mapply(function(i, k) # The function that you are `mapply`ing is: # rbingd'ing a list of dataframes, which were subsetted by matching var1 & var2 # and then multiplying by a value in filingstat do.call(rbind, # iterating over m lapply(1:4, function(m) # the cbind is for adding the newvar=m, at the end of the subtable cbind( # we transpose twice: first the subset to multiply our vector. # Then the result, to get back our orignal form t( t(subset(mydf, var1==i & mydf$var2==k)) * (multpVec * filingstat0711[i,k,m] + invVec)), # this is an argument to cbind "newvar"=m) )), # the two lists you are passing as arguments are the columns of the expanded grid ixk$i, ixk$k, SIMPLIFY=FALSE ) # flatten the data frame newdf <- do.call(rbind, newdf)

Two points to note:

(1) Try not to use words like data, table, df, sub etc which are commonly used functions In the above code I used mydf in place of data.

(2) You can use apply(ixk, 1, fu..) instead of the mapply that I used, but I think mapply makes for cleaner code in this situation

Good luck, and welcome to SO

这篇关于如何在R中矢量化for循环的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在R中矢量化for循环 [英] How to vectorize a for loop in R

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何在R中矢量化for循环 [英] How to vectorize a for loop in R

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭