用应用函数重写循环 [英] Rewriting loops with apply functions

查看：131 发布时间：2018/1/28 13:31:49 r for-loop recursion apply performance

本文介绍了用应用函数重写循环的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有3个以下的功能，我想快些，我认为应用函数是最好的方法，但是我从来没有使用过apply函数，所以我不知道该怎么做。任何类型的提示，想法和代码片断将不胜感激。
$ b $ n，T，dt是全局参数，par是一个参数向量。

函数1：is创建一个包含泊松分布跳转的m + 1，n矩阵的函数，该跳转具有按指数分布的跳跃大小。我的烦恼是因为我有3个循环，我不知道如何将if语句合并到内部循环中。我也不知道是否完全可以在循环的外层使用apply函数。

  jump< ;  - 函数（t = 0，T = T，par）{
跳转矩阵（0，T / dt + 1，n）＃初始化输出矩阵
 U<  -  replicate（n ，runif（100，t，T））#matrix用于决定何时跳转
 Y <-replicate（n，rexp（100,1 / par [6]））#matrix跳转大小$对于（l in 1：n）{b $ b NT < -  rpois（1，par [5] * T）#b $ b跳跃次数
k = 0 
 （t，T，dt））{
k = k + 1 
 if（NT> 0）{
 temp = 0 
 for（i in 1：NT）{$ b如果（U [i，l]> j）{u [i] = 0 
}否则u [i] = 1 
u $  -  vector（numeric，NT） $ b temp = temp + Y [i，l] * u [i] 
} 
 jump [k，l] = temp 
}否则跳转[k，l] = 0 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 $ b $函数2：基于布朗运动和跳跃fr计算默认强度om函数1.这里我的麻烦是如何使用应用函数，当用于计算的变量是从输出矩阵上面的行的值，以及如何从计算中使用的外部矩阵（BMz_C &安培; J）
 
 
  lambda < - 函数（t = 0，T = T，par，fit = 0）{$ b $用于保持强度路径输出
 lambda [1，] < -  par [4]＃的初始化强度路径的初始化开始值的矩阵b  dlambda < -  par [1] *）的
 j < -  jump（t，T，par） （max [lambda [i-1，]，0））* BMz_C [i，] +（ J [i-1]] 
 lambda [i，]-λ[i-1，] + dlambda 
} 
 return（lambda）
 
 
 
 
 
 函数3：根据函数2的强度计算生存概率。 ）和B（）是返回数值的函数。我的问题在于，我使用了两个值i和j，因为我并不总是一个整数，因此可以用来引用外部矩阵。我曾经尝试过使用i / dt，但是有时它会覆盖一行，并跳过矩阵中的下一行，这很可能是由于四舍五入错误造成的。 
 
  S< 1，n）
 if（fit> 0）Sfit < - 矩阵（0,1，长度（mat））else Sfit <-0 b $ bl = lambda（t，T （t，T，dt））{
j = j + 1 
 S [j，] < -  a （i，T，par）* exp（B（i，T，par）* l [j，]）
} 
 return（S）
} 
  
很抱歉，对于任何一个函数，任何帮助都将不胜感激。
 
 
 编辑：
首先感谢digEmAll的伟大的答复。
 
 
我现在已经在矢量化函数2.首先我试过 
 
 
  lambda < - 函数（t = 0，T = T，par，fit = 0）{
 lambda< ;  - 保持intesity路径输入的矩阵（0，m + 1，n）＃矩阵
 J < -  jump（t，T，par，fit）
 lambda [1，] < -  par [4] （2）:( m + 1），函数（i）{
 lambda [i-1，] + par [1] * （参数[2] -max（拉姆达[I-1，]，0））* dt的+帕[3] * SQRT（MAX（拉姆达[I-1，]，0））* BMz_C [I，] +（ J [i，]  -  J [i-1，]）
}）
 return（lambda）
} 
  
但它只会产生第一列。所以我尝试了两步应用函数。
 
 
  lambda < -  function（t = 0，T = T，par，fit = 0）{
 lambda < - 矩阵（0，m + 1，n）＃矩阵来保存intesity路径输入
 J < -  jump（t，T，par，fit）$ b $ （1：n），函数（1）{
 sapply（2：（m + 1），] < -  par [4] 
 lambda [2： （m + 1），函数（i）{
 lambda [i-1，l] + par [1] *（par [2] -max（lambda [i-1，l]，0））* dt + par [3] * sqrt（max（lambda [i-1,1]，0））* BMz_C [i，l] +（J [1,1] -J [i-1，1]）$ b $ b $）b 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 $ b <这似乎工作，但只有在第一行，之后的所有行有一个相同的非零值，就像在lambda [i]的计算中没有使用lambda [i-1]，是否有人有想法如何管理？
解决方案
我要向您解释如何向量化第一个功能（矢量化的一种可能方式，可能不是您的案例中最好的一种）。
对于其他两个功能，你可以简单地应用相同的概念，你应该能够做到这一点。 
 $ b 
这里的关键概念是：从最内层的循环开始进行矢量化。
 
 < hr> 
 
 <1>首先， rpois 一次可以产生多个随机值，次问一个随机值。所以，让我们把它从循环中取出来：
 
 
  jump < -  function（t = 0，T = T， （0，T / dt + 1，n）
 U<  - 复制（n，runif（100，t，T））
 Y< ; -replicate（n，rexp（100,1 / par [6]））
 NTs <-rpois（n，par [5] * T）＃注意变化
 for（l in 1 （t，T，dt））{
k = k）（$）$ $ 
 
 $ =如果（NT> 0）{
 temp = 0 
 for（i in 1：NT）{
u < -  vector（numeric，NT）
 if if（U [i，l]> j）{u [i] = 0 
} else u [i] = 1 
 temp = temp + Y [i，l] * u [ i] 
} 
 jump [k，l] = temp 
}否则跳转[k，l] = 0 
} 
} 
 return ）
} 
  
 
 
 
 
 
 2）同样，在循环中调用 seq（t，T，dt） n次是没有用的/效率低下的，因为它总是会产生相同的序列。所以，让我们把它从循环中取出并存储到一个向量中，得到这个：
 $ b $ pre $  jump<  -  function（t = 0，T = T，par）{
 jump < - 矩阵（0，T / dt + 1，n）
 U < -  replicate（n，runif（100，t，T）） 
 Y <-replicate（n，rexp（100,1 / par [6]））
 NTs <-rpois（n，par [5] * T）
 js< -  seq（t，T，dt）＃注意（l in 1：n）{
 NT <-NTs [1] 
k = 0 
的变化
如果（NT> 0）{
 temp = 0 
 for（i in 1：NT）{$ b（j）in js）{＃note the change 
k = k + 1 
如果（U [i，l]> j）{u [i] = 0 
}否则u [i] = 1 
u $  -  vector（numeric，NT） $ b temp = temp + Y [i，l] * u [i] 
} 
 jump [k，l] = temp 
}否则跳转[k，l] = 0 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 $ 
 $ b $ 3）现在我们来看看最内层的循环：
 
 $ $ $ $ $ $ $ $我在1：NT）{
u<  -  vector（numeric，NT）
如果（U [i，l]> j）{u [i] = 0 
} else u [i] = 1 
 temp = temp + Y [i，l] * u [i] 
} 
  
这等于：
 $ b $ （U [1：NT，1] <= j）
 temp < -  sum（Y [1：NT， l] * u）
  
或单行： 
 $ （U [1：NT，1] <= j））$ b 
  temp < -  sum（Y [1：NT，1] * as.integer因此，现在的函数可以写成：
 
  
 
  （t = 0，T = T，par）{
跳转< - 矩阵（0，T / dt + 1，n）
 U < -  replicate（n，runif（100，t，T））
 Y <-replicate（n，rexp（100,1 / par [6]））
 NTs<对于（l in 1：n）{
 NT <-NTs），rpois（n，par [5] * T）
 js < -  seq（t，T，dt）如果（NT> 0）{
 jump [k，l]< b> [l] 
k = 0 
（j in js）{
k = k + 1 
 ; sum（Y [1：NT，1] * as.integer（U [1：NT，1] <= j））＃注意变化
 else else jump [k，l] = 0 
} 
} 
返回（跳转）
} 
  
 
 
 
 
 
  4）再来看看当前最内层的循环： （$ j 
 
 
 
 $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ （U [1：NT，1] <= j））$＃
 jump [k，l]  
 
 
 
 
 
 
 
 
 
 $ b code> NT 不依赖于这个循环的迭代，所以内部的 if可以被移动到外面，如下所示：如果（NT> 0）{
 for（j in js））{
k = k + 1 $ b（
 
 $ pre $  $ b jump [k，l] } 
} else {
 for（j in js）{
k = k + 1 
 jump [k，l] = 0 
} 
}

这看起来比以前更糟了，是的，但现在这两个条件可以转化为单线的注意使用 sapply ¹）：

  if（NT> 0 ）{
 jump [1：length（js），l]< -sa （其中，j∈{1，NT，1，...，j），函数（j）{sum（Y [1：NT，1] * as.integer（U [1：NT，1] <= j））}）
} else {
跳[1：长度（js），l] < -  0 
}

获得以下的跳转函数： $ $ $ $ $ $ $ $ $ $ $ $跳转函数（t = 0，T = T，par）{$ b $ （n，runif（100，t，T）） Y< -replicate（n， rexp（100,1 / par [6]）） NTs <-rpois（n，par [5] * T） js < - seq（t，T，dt）$ b $如果（NT> 0）{ jump [1：length（js），1]，则对于（1in:n）{bb $ b NT <-NTs [1] - （sa）（js，function（j）{sum（Y [1：NT，1] * as.integer（U [1：NT，1] <= j））}） } else { $ j $ b $ >

5）最后我们可以去掉最后一个循环，再次使用sapply¹函数，获得最终的跳转函数： jump < - function（t = 0，T = T，par）{ U < - （n，rexp（100,1 / par [6]）） js< - seq（t，T ，dt） NTs <-rpois（n，par [5] * T） jump < - sapply（1：n，function（1）{ NT如果（NT> 0）{ sapply（js，function（j）{sum（Y [1：NT，l] * as.integer（U [1： NT，l] <= j））}） } else { rep（0，length（js）） } }） return ） } ¹） sapply 功能相当简单易用。对于在 X 参数中传递的列表或向量的每个元素，它将应用在 FUN 参数中传递的函数，例如：
vect <-1：3 sapply（X = vect，FUN = function（el）{el因为默认情况下<$ c $ $ c> simplify 参数为true，结果被强制为最简单的可能对象，所以，例如在前一种情况下，结果变成一个向量，而在下面的例子中，结果变成一个矩阵对于每个元素，我们返回一个相同大小的向量）： vect <-1：3 sapply X = vect，FUN = function（el）{rep（el，5）}）＃[，1] [，2] [，3] ＃[1，] 1 2 3 ＃[2，] 1 2 3 ＃[3，] 1 2 3 ＃[4，] 1 2 3 ＃[5，] 1 2 3 基准：下面的基准测试只是给你一个速度增益的概念，但实际的性能可能会有所不同，具体取决于你的输入参数。正如你可以想象一下， jump_old 对应onds to your original function 1，while jump_new 是最终的向量化版本。
<$ c $我们使用一些随机参数 n = 10 m = 3 T = 13 par = c（0.1,0.2,0.3,0.4,0.5,0.6）$ b $ （t = T，par = par）（3）（b））＃用户系统经过＃12.39 0.00 12.41 $ b $ set.seed（123） system.time（for（in in 1：5000）new< - jump_new（T = T，par = par））＃用户系统经过＃4.49 0.00 4.53 检查2个函数的最后结果是否相同： isTRUE（all.equal（old，new））＃[1] TRUE

I have the 3 following functions which I would like to make faster, I assume apply functions are the best way to go, but I have never used apply functions, so I have no idea what to do. Any type of hints, ideas and code snippets will be much appreciated.

n, T, dt are global parameters and par is a vector of parameters.

Function 1: is a function to create an m+1,n matrix containing poisson distributed jumps with exponentially distributed jump sizes. My troubles here is because I have 3 loops and I am not sure how to incorporate the if statement in the inner loop. Also I have no idea if it is at all possible to use apply functions on the outer layers of the loops only.
jump <- function(t=0,T=T,par){ jump <- matrix(0,T/dt+1,n) # initializing output matrix U <- replicate(n,runif(100,t,T)) #matrix used to decide when the jumps will happen Y <-replicate(n,rexp(100,1/par[6])) #matrix with jump sizes for (l in 1:n){ NT <- rpois(1,par[5]*T) #number of jumps k=0 for (j in seq(t,T,dt)){ k=k+1 if (NT>0){ temp=0 for (i in 1:NT){ u <- vector("numeric",NT) if (U[i,l]>j){ u[i]=0 }else u[i]=1 temp=temp+Y[i,l]*u[i] } jump[k,l]=temp }else jump[k,l]=0 } } return(jump) }
Function 2: calculates a default intensity, based on Brownian motions and the jumps from function 1. Here my trouble is how to use apply functions when the variable used for the calculation is the values from the row above in the output matrix AND how to get the right values from the external matrices which are used in the calculations (BMz_C & J)
lambda <- function(t=0,T=T,par,fit=0){ lambda <- matrix(0,m+1,n) # matrix to hold intesity path output lambda[1,] <- par[4] #initializing start value of the intensity path. J <- jump(t,T,par) #matrix containing jumps for(i in 2:(m+1)){ dlambda <- par[1]*(par[2]-max(lambda[i-1,],0))*dt+par[3]*sqrt(max(lambda[i- 1,],0))*BMz_C[i,]+(J[i,]-J[i-1,]) lambda[i,] <- lambda[i-1,]+dlambda } return(lambda) }
Function 3: calculates a survival probability based on the intensity from function 2. Here a() and B() are functions that return numerical values. My problem here is that the both value i and j are used because i is not always an integer which thus can to be used to reference the external matrix. I have earlier tried to use i/dt, but sometimes it would overwrite one line and skip the next lines in the matrix, most likely due to rounding errors.
S <- function(t=0,T=T,par,plot=0, fit=0){ S <- matrix(0,(T-t)/dt+1,n) if (fit > 0) S.fit <- matrix(0,1,length(mat)) else S.fit <- 0 l=lambda(t,T,par,fit) j=0 for (i in seq(t,T,dt)){ j=j+1 S[j,] <- a(i,T,par)*exp(B(i,T,par)*l[j,]) } return(S) }
Sorry for the long post, any help for any of the functions will be much appreciated.

EDIT: First of all thanks to digEmAll for the great reply.

I have now worked on vectorising function 2. First I tried
lambda <- function(t=0,T=T,par,fit=0){ lambda <- matrix(0,m+1,n) # matrix to hold intesity path input J <- jump(t,T,par,fit) lambda[1,] <- par[4] lambda[2:(m+1),] <- sapply(2:(m+1), function(i){ lambda[i-1,]+par[1]*(par[2]-max(lambda[i-1,],0))*dt+par[3]*sqrt(max(lambda[i-1,],0))*BMz_C[i,]+(J[i,]-J[i-1,]) }) return(lambda) }
but it would only produce the first column. So I tried a two step apply function.
lambda <- function(t=0,T=T,par,fit=0){ lambda <- matrix(0,m+1,n) # matrix to hold intesity path input J <- jump(t,T,par,fit) lambda[1,] <- par[4] lambda[2:(m+1),] <- sapply(1:n, function(l){ sapply(2:(m+1), function(i){ lambda[i-1,l]+par[1]*(par[2]-max(lambda[i-1,l],0))*dt+par[3]*sqrt(max(lambda[i-1,l],0))*BMz_C[i,l]+(J[i,l]-J[i-1,l]) }) }) return(lambda) }
This seems to work, but only on the first row, all rows after that have an identical non-zero value, as if lambda[i-1] is not used in the calculation of lambda[i], does anyone have an idea how to manage that?
解决方案
I'm going to explain to you, setp-by-step, how to vectorize the first function (one possible way of vectorization, maybe not the best one for your case).
For the others 2 functions, you can simply apply the same concepts and you should be able to do it.

Here, the key concept is: start to vectorize from the innermost loop.

1) First of all, rpois can generate more than one random value at a time but you are calling it n-times asking one random value. So, let's take it out of the loop obtaining this:
jump <- function(t=0,T=T,par){ jump <- matrix(0,T/dt+1,n) U <- replicate(n,runif(100,t,T)) Y <-replicate(n,rexp(100,1/par[6])) NTs <- rpois(n,par[5]*T) # note the change for (l in 1:n){ NT <- NTs[l] # note the change k=0 for (j in seq(t,T,dt)){ k=k+1 if (NT>0){ temp=0 for (i in 1:NT){ u <- vector("numeric",NT) if (U[i,l]>j){ u[i]=0 }else u[i]=1 temp=temp+Y[i,l]*u[i] } jump[k,l]=temp }else jump[k,l]=0 } } return(jump) }

2) Similarly, it is useless/inefficient to call seq(t,T,dt) n-times in the loop since it will always generate the same sequence. So, let's take it out of the loop and store into a vector, obtainig this:
jump <- function(t=0,T=T,par){ jump <- matrix(0,T/dt+1,n) U <- replicate(n,runif(100,t,T)) Y <-replicate(n,rexp(100,1/par[6])) NTs <- rpois(n,par[5]*T) js <- seq(t,T,dt) # note the change for (l in 1:n){ NT <- NTs[l] k=0 for (j in js){ # note the change k=k+1 if (NT>0){ temp=0 for (i in 1:NT){ u <- vector("numeric",NT) if (U[i,l]>j){ u[i]=0 }else u[i]=1 temp=temp+Y[i,l]*u[i] } jump[k,l]=temp }else jump[k,l]=0 } } return(jump) }

3) Now, let's have a look at the innermost loop:
for (i in 1:NT){ u <- vector("numeric",NT) if (U[i,l]>j){ u[i]=0 }else u[i]=1 temp=temp+Y[i,l]*u[i] }
this is equal to :
u <- as.integer(U[1:NT,l]<=j) temp <- sum(Y[1:NT,l]*u)
or, in one-line:
temp <- sum(Y[1:NT,l] * as.integer(U[1:NT,l] <= j))
hence, now the function can be written as :
jump <- function(t=0,T=T,par){ jump <- matrix(0,T/dt+1,n) U <- replicate(n,runif(100,t,T)) Y <-replicate(n,rexp(100,1/par[6])) NTs <- rpois(n,par[5]*T) js <- seq(t,T,dt) for (l in 1:n){ NT <- NTs[l] k=0 for (j in js){ k=k+1 if (NT>0){ jump[k,l] <- sum(Y[1:NT,l]*as.integer(U[1:NT,l]<=j)) # note the change }else jump[k,l]=0 } } return(jump) }

4) Again, let's have a look at the current innermost loop:
for (j in js){ k=k+1 if (NT>0){ jump[k,l] <- sum(Y[1:NT,l]*as.integer(U[1:NT,l]<=j)) # note the change }else jump[k,l]=0 }
as you can notice, NT does not depend on the iteration of this loop, so the inner if can be moved outside, as follows:
if (NT>0){ for (j in js){ k=k+1 jump[k,l] <- sum(Y[1:NT,l]*as.integer(U[1:NT,l]<=j)) # note the change } }else{ for (j in js){ k=k+1 jump[k,l]=0 } }
this seems worse than before, well yes it is, but now the 2 conditions can be turned into one-liner's (note the use of sapply¹):
if (NT>0){ jump[1:length(js),l] <- sapply(js,function(j){ sum(Y[1:NT,l]*as.integer(U[1:NT,l]<=j)) }) }else{ jump[1:length(js),l] <- 0 }
obtaining the following jump function:
jump <- function(t=0,T=T,par){ jump <- matrix(0,T/dt+1,n) U <- replicate(n,runif(100,t,T)) Y <-replicate(n,rexp(100,1/par[6])) NTs <- rpois(n,par[5]*T) js <- seq(t,T,dt) for (l in 1:n){ NT <- NTs[l] if (NT>0){ jump[1:length(js),l] <- sapply(js,function(j){ sum(Y[1:NT,l]*as.integer(U[1:NT,l]<=j)) }) }else{ jump[1:length(js),l] <- 0 } } return(jump) }

5) finally we can get rid of the last loop, using again the sapply¹ function, obtaining the final jump function :
jump <- function(t=0,T=T,par){ U <- replicate(n,runif(100,t,T)) Y <-replicate(n,rexp(100,1/par[6])) js <- seq(t,T,dt) NTs <- rpois(n,par[5]*T) jump <- sapply(1:n,function(l){ NT <- NTs[l] if (NT>0){ sapply(js,function(j){ sum(Y[1:NT,l]*as.integer(U[1:NT,l]<=j)) }) }else { rep(0,length(js)) } }) return(jump) }

(¹)

sapply function is pretty easy to use. For each element of the list or vector passed in the X parameter, it applies the function passed in the FUN parameter, e.g. :
vect <- 1:3 sapply(X=vect,FUN=function(el){el+10} # [1] 11 12 13
since by default the simplify parameter is true, the result is coerced to the simplest possible object. So, for example in the previous case the result becomes a vector, while in the following example result become a matrix (since for each element we return a vector of the same size) :
vect <- 1:3 sapply(X=vect,FUN=function(el){rep(el,5)}) # [,1] [,2] [,3] # [1,] 1 2 3 # [2,] 1 2 3 # [3,] 1 2 3 # [4,] 1 2 3 # [5,] 1 2 3

Benchmark :

The following benchmark just give you an idea of the speed gain, but the actual performances may be different depending on your input parameters.
As you can imagine, jump_old corresponds to your original function 1, while jump_new is the final vectorized version.
# let's use some random parameters n = 10 m = 3 T = 13 par = c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6) dt <- 3 set.seed(123) system.time(for(i in 1:5000) old <- jump_old(T=T,par=par)) # user system elapsed # 12.39 0.00 12.41 set.seed(123) system.time(for(i in 1:5000) new <- jump_new(T=T,par=par)) # user system elapsed # 4.49 0.00 4.53 # check if last results of the 2 functions are the same: isTRUE(all.equal(old,new)) # [1] TRUE

这篇关于用应用函数重写循环的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

用应用函数重写循环 [英] Rewriting loops with apply functions

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

用应用函数重写循环 [英] Rewriting loops with apply functions

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭