如何在R中矢量化for循环 [英] How to vectorize a for loop in R
问题描述
我试图清理这个代码,并想知道是否有人有任何建议,如何在没有循环运行在R。我有一个数据集称为数据与100个变量和20万观测。我想要做的就是扩大数据集,将每个观察值乘以特定的标量,然后将这些数据组合在一起。最后,我需要一个包含80万个观察值(我有四个类别创建)和101个变量的数据集。这是我写的一个循环,但是效率很低,我希望更快,更高效。
datanew < - c()
for(i in 1:51){
for (in 1:4){
for(m in 1:4){
sub < - subset(data,data $ var1 == i& data $ var2 == k )
sub [,4:(ncol(sub)-1)] < - filingstat0711 [i,k,m] * sub [,4:(ncol(sub)-1)]
sub $ newvar < - m
datanew < - rbind(datanew,sub)
}
请让我知道您的想法,并感谢您的帮助。
下面是一些带有2K观察值的样本数据,而不是200K
#SAMPLE DATA
#------------------------------------------ (矩阵(100 * 20e2),ncol = 20e2,nrow = 100))
var1 <-c(sapply (seq(41),function(x)sample(1:51)))[1:20e2]
var2 <-c(sapply(seq(2 + 20e2 / 6) 1:6)))[1:20e2]
#---------------------------------- #
mydf < - cbind(var1,var2,round(mydf [3:100] * 2.5,2))
filingstat0711< - array(rnorm(51 * 6 * 4)* 1.5 + abs(rnorm(2)* 10)),dim = c(51,6,4))
#--------------------- ---------------------------#
您可以尝试以下操作。请注意,我们用调用 mapply
替换了前两个for循环,第三个for循环调用了lapply。
另外,我们正在创建两个向量,我们将结合使用向量化乘法。
#使用`expand.grid`创建ik索引组合的表
ixk < - expand.grid (i = 1:51,k = 1:6)
#看看expand.grid是什么
头(ixk,60)
$ (c(0,1),times = c(4,ncol(mydf)-4-1)),0(b,b)生成两个向量,用于乘以我们的数据帧子集
multpVec < )
invVec< - !multpVec
#如何使用向量
(multpVec * filingstat0711 [1,2,1] + invVec)
#而不是for循环,我们可以使用mapply。
newdf< -
mapply(function(i,k)
#你正在使用的函数是:
#通过匹配var1& var2
#然后乘以filingstat中的一个值
来进行子集化的数据帧do.call(rbind,
#遍历m
lapply(1 :4,函数(m)
#cbind是用于添加newvar = m,在子表的末尾
cbind(
#)我们转置两次:首先将子集与我们的向量相乘
#然后返回结果得到原始形式
t(subset(mydf,var1 == i& mydf $ var2 == k))*
(multpVec * filingstat0711 [i,k,m] + invVec)),
#这是一个参数给cbind
newvar= m)
) ),
#你传递的两个列表作为参数是展开网格的列
ixk $ i,ixk $ k,SIMPLIFY = FALSE
)
#f latten数据帧
newdf< - do.call(rbind,newdf)
< (1)尽量不要使用数据
,表
, df
,子
等等常用函数
在上面的代码中,我用 mydf
来代替 data
。
(2)您可以使用 apply(ixk,1,fu ..)
来代替 mapply
我用过,但我认为在这种情况下,使得代码变得更简洁
祝你好运,欢迎来到SO
I'm trying to clean this code up and was wondering if anybody has any suggestions on how to run this in R without a loop. I have a dataset called data with 100 variables and 200,000 observations. What I want to do is essentially expand the dataset by multiplying each observation by a specific scalar and then combine the data together. In the end, I need a data set with 800,000 observations (I have four categories to create) and 101 variables. Here's a loop that I wrote that does this, but it is very inefficient and I'd like something quicker and more efficient.
datanew <- c()
for (i in 1:51){
for (k in 1:6){
for (m in 1:4){
sub <- subset(data,data$var1==i & data$var2==k)
sub[,4:(ncol(sub)-1)] <- filingstat0711[i,k,m]*sub[,4:(ncol(sub)-1)]
sub$newvar <- m
datanew <- rbind(datanew,sub)
}
}
}
Please let me know what you think and thanks for the help.
Below is some sample data with 2K observations instead of 200K
# SAMPLE DATA
#------------------------------------------------#
mydf <- as.data.frame(matrix(rnorm(100 * 20e2), ncol=20e2, nrow=100))
var1 <- c(sapply(seq(41), function(x) sample(1:51)))[1:20e2]
var2 <- c(sapply(seq(2 + 20e2/6), function(x) sample(1:6)))[1:20e2]
#----------------------------------#
mydf <- cbind(var1, var2, round(mydf[3:100]*2.5, 2))
filingstat0711 <- array(round(rnorm(51*6*4)*1.5 + abs(rnorm(2)*10)), dim=c(51,6,4))
#------------------------------------------------#
You can try the following. Notice that we replaced the first two for loops with a call to mapply
and the third for loop with a call to lapply.
Also, we are creating two vectors that we will combine for vectorized multiplication.
# create a table of the i-k index combinations using `expand.grid`
ixk <- expand.grid(i=1:51, k=1:6)
# Take a look at what expand.grid does
head(ixk, 60)
# create two vectors for multiplying against our dataframe subset
multpVec <- c(rep(c(0, 1), times=c(4, ncol(mydf)-4-1)), 0)
invVec <- !multpVec
# example of how we will use the vectors
(multpVec * filingstat0711[1, 2, 1] + invVec)
# Instead of for loops, we can use mapply.
newdf <-
mapply(function(i, k)
# The function that you are `mapply`ing is:
# rbingd'ing a list of dataframes, which were subsetted by matching var1 & var2
# and then multiplying by a value in filingstat
do.call(rbind,
# iterating over m
lapply(1:4, function(m)
# the cbind is for adding the newvar=m, at the end of the subtable
cbind(
# we transpose twice: first the subset to multiply our vector.
# Then the result, to get back our orignal form
t( t(subset(mydf, var1==i & mydf$var2==k)) *
(multpVec * filingstat0711[i,k,m] + invVec)),
# this is an argument to cbind
"newvar"=m)
)),
# the two lists you are passing as arguments are the columns of the expanded grid
ixk$i, ixk$k, SIMPLIFY=FALSE
)
# flatten the data frame
newdf <- do.call(rbind, newdf)
Two points to note:
(1) Try not to use words like data
, table
, df
, sub
etc which are commonly used functions
In the above code I used mydf
in place of data
.
(2) You can use apply(ixk, 1, fu..)
instead of the mapply
that I used, but I think mapply makes for cleaner code in this situation
Good luck, and welcome to SO
这篇关于如何在R中矢量化for循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!