使用应用函数重写循环 [英] Rewriting loops with apply functions

查看:18
本文介绍了使用应用函数重写循环的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下 3 个函数,我想让它们更快,我认为应用函数是最好的方法,但我从未使用过应用函数,所以我不知道该怎么做.任何类型的提示、想法和代码片段将不胜感激.

I have the 3 following functions which I would like to make faster, I assume apply functions are the best way to go, but I have never used apply functions, so I have no idea what to do. Any type of hints, ideas and code snippets will be much appreciated.

n、T、dt 是全局参数,par 是参数向量.

n, T, dt are global parameters and par is a vector of parameters.

函数 1:是创建 m+1,n 矩阵的函数,该矩阵包含具有指数分布跳跃大小的泊松分布跳跃.我的麻烦是因为我有 3 个循环,我不确定如何将 if 语句合并到内部循环中.我也不知道是否可以只在循环的外层使用应用函数.

Function 1: is a function to create an m+1,n matrix containing poisson distributed jumps with exponentially distributed jump sizes. My troubles here is because I have 3 loops and I am not sure how to incorporate the if statement in the inner loop. Also I have no idea if it is at all possible to use apply functions on the outer layers of the loops only.

jump <- function(t=0,T=T,par){
  jump <- matrix(0,T/dt+1,n) # initializing output matrix
  U <- replicate(n,runif(100,t,T)) #matrix used to decide when the jumps will happen
  Y <-replicate(n,rexp(100,1/par[6])) #matrix with jump sizes
  for (l in 1:n){ 
    NT <- rpois(1,par[5]*T) #number of jumps
    k=0
    for (j in seq(t,T,dt)){
      k=k+1
      if (NT>0){
        temp=0
        for (i in 1:NT){
          u <- vector("numeric",NT)
          if (U[i,l]>j){ u[i]=0
          }else u[i]=1
          temp=temp+Y[i,l]*u[i]
        }
        jump[k,l]=temp
      }else jump[k,l]=0
    }
  }
  return(jump)
}

函数 2:基于布朗运动和函数 1 的跳跃计算默认强度.这里我的问题是当用于计算的变量是输出矩阵中上一行的值时如何使用应用函数 AND如何从计算中使用的外部矩阵中获得正确的值 (BMz_C & J)

Function 2: calculates a default intensity, based on Brownian motions and the jumps from function 1. Here my trouble is how to use apply functions when the variable used for the calculation is the values from the row above in the output matrix AND how to get the right values from the external matrices which are used in the calculations (BMz_C & J)

lambda <- function(t=0,T=T,par,fit=0){
  lambda <- matrix(0,m+1,n) # matrix to hold intesity path output
  lambda[1,] <- par[4] #initializing start value of the intensity path.
  J <- jump(t,T,par) #matrix containing jumps
  for(i in 2:(m+1)){
    dlambda <- par[1]*(par[2]-max(lambda[i-1,],0))*dt+par[3]*sqrt(max(lambda[i-   1,],0))*BMz_C[i,]+(J[i,]-J[i-1,])
    lambda[i,] <- lambda[i-1,]+dlambda
  } 
  return(lambda)
} 

函数 3:根据函数 2 的强度计算生存概率.这里的 a() 和 B() 是返回数值的函数.我的问题是同时使用了 i 和 j 值,因为 i 并不总是一个整数,因此可以用来引用外部矩阵.我之前曾尝试使用 i/dt,但有时它会覆盖矩阵中的一行并跳过下一行,很可能是由于舍入错误.

Function 3: calculates a survival probability based on the intensity from function 2. Here a() and B() are functions that return numerical values. My problem here is that the both value i and j are used because i is not always an integer which thus can to be used to reference the external matrix. I have earlier tried to use i/dt, but sometimes it would overwrite one line and skip the next lines in the matrix, most likely due to rounding errors.

S <- function(t=0,T=T,par,plot=0, fit=0){
  S <- matrix(0,(T-t)/dt+1,n)
  if (fit > 0) S.fit <- matrix(0,1,length(mat)) else S.fit <- 0
  l=lambda(t,T,par,fit)
  j=0
  for (i in seq(t,T,dt)){
    j=j+1
    S[j,] <- a(i,T,par)*exp(B(i,T,par)*l[j,])
  }
  return(S)
}

抱歉,帖子太长,对任何功能的任何帮助将不胜感激.

Sorry for the long post, any help for any of the functions will be much appreciated.

首先感谢 digEmAll 的精彩回复.

First of all thanks to digEmAll for the great reply.

我现在已经研究了向量化函数 2.首先我尝试了

I have now worked on vectorising function 2. First I tried

lambda <- function(t=0,T=T,par,fit=0){
  lambda <- matrix(0,m+1,n) # matrix to hold intesity path input
  J <- jump(t,T,par,fit)
    lambda[1,] <- par[4] 
    lambda[2:(m+1),] <- sapply(2:(m+1), function(i){
      lambda[i-1,]+par[1]*(par[2]-max(lambda[i-1,],0))*dt+par[3]*sqrt(max(lambda[i-1,],0))*BMz_C[i,]+(J[i,]-J[i-1,])
    })
  return(lambda)
}

但它只会产生第一列.所以我尝试了一个两步应用函数.

but it would only produce the first column. So I tried a two step apply function.

lambda <- function(t=0,T=T,par,fit=0){
  lambda <- matrix(0,m+1,n) # matrix to hold intesity path input
  J <- jump(t,T,par,fit)
  lambda[1,] <- par[4] 
  lambda[2:(m+1),] <- sapply(1:n, function(l){
    sapply(2:(m+1), function(i){
      lambda[i-1,l]+par[1]*(par[2]-max(lambda[i-1,l],0))*dt+par[3]*sqrt(max(lambda[i-1,l],0))*BMz_C[i,l]+(J[i,l]-J[i-1,l])
    })
  })
  return(lambda)
}

这似乎有效,但仅在第一行,之后的所有行都具有相同的非零值,就好像 lambda[i-1] 未用于 lambda[i] 的计算一样,有没有人有知道如何管理吗?

This seems to work, but only on the first row, all rows after that have an identical non-zero value, as if lambda[i-1] is not used in the calculation of lambda[i], does anyone have an idea how to manage that?

推荐答案

我将一步一步地向你解释如何向量化第一个函数(一种可能的方法矢量化,可能不是最适合您的情况).
对于其他 2 个功能,您可以简单地应用相同的概念,您应该能够做到.

I'm going to explain to you, setp-by-step, how to vectorize the first function (one possible way of vectorization, maybe not the best one for your case).
For the others 2 functions, you can simply apply the same concepts and you should be able to do it.

这里的关键概念是:从最里面的循环开始矢量化.

1) 首先,rpois 可以一次生成多个随机值,但您调用它 n 次要求一个随机值.所以,让我们把它从循环中取出来:

1) First of all, rpois can generate more than one random value at a time but you are calling it n-times asking one random value. So, let's take it out of the loop obtaining this:

jump <- function(t=0,T=T,par){
  jump <- matrix(0,T/dt+1,n) 
  U <- replicate(n,runif(100,t,T)) 
  Y <-replicate(n,rexp(100,1/par[6])) 
  NTs <- rpois(n,par[5]*T) # note the change
  for (l in 1:n){ 
    NT <- NTs[l] # note the change
    k=0
    for (j in seq(t,T,dt)){
      k=k+1
      if (NT>0){
        temp=0
        for (i in 1:NT){
          u <- vector("numeric",NT)
          if (U[i,l]>j){ u[i]=0
          }else u[i]=1
          temp=temp+Y[i,l]*u[i]
        }
        jump[k,l]=temp
      }else jump[k,l]=0
    }
  }
  return(jump)
}

<小时>

2) 同样,在循环中调用 seq(t,T,dt) n 次是无用/低效的,因为它总是会生成相同的序列.所以,让我们把它从循环中取出并存储到一个向量中,得到这个:


2) Similarly, it is useless/inefficient to call seq(t,T,dt) n-times in the loop since it will always generate the same sequence. So, let's take it out of the loop and store into a vector, obtainig this:

jump <- function(t=0,T=T,par){
  jump <- matrix(0,T/dt+1,n) 
  U <- replicate(n,runif(100,t,T)) 
  Y <-replicate(n,rexp(100,1/par[6])) 
  NTs <- rpois(n,par[5]*T) 
  js <- seq(t,T,dt) # note the change
  for (l in 1:n){ 
    NT <- NTs[l] 
    k=0
    for (j in js){ # note the change
      k=k+1
      if (NT>0){
        temp=0
        for (i in 1:NT){
          u <- vector("numeric",NT)
          if (U[i,l]>j){ u[i]=0
          }else u[i]=1
          temp=temp+Y[i,l]*u[i]
        }
        jump[k,l]=temp
      }else jump[k,l]=0
    }
  }
  return(jump)
}

<小时>

3) 现在,让我们看看最里面的循环:


3) Now, let's have a look at the innermost loop:

for (i in 1:NT){
  u <- vector("numeric",NT)
  if (U[i,l]>j){ u[i]=0
  }else u[i]=1
  temp=temp+Y[i,l]*u[i]
}

这等于:

u <- as.integer(U[1:NT,l]<=j)
temp <- sum(Y[1:NT,l]*u)

或者,在一行中:

temp <- sum(Y[1:NT,l] * as.integer(U[1:NT,l] <= j))

因此,现在函数可以写成:

hence, now the function can be written as :

jump <- function(t=0,T=T,par){
  jump <- matrix(0,T/dt+1,n) 
  U <- replicate(n,runif(100,t,T)) 
  Y <-replicate(n,rexp(100,1/par[6])) 
  NTs <- rpois(n,par[5]*T) 
  js <- seq(t,T,dt) 
  for (l in 1:n){ 
    NT <- NTs[l] 
    k=0
    for (j in js){ 
      k=k+1
      if (NT>0){
        jump[k,l] <- sum(Y[1:NT,l]*as.integer(U[1:NT,l]<=j)) # note the change
      }else jump[k,l]=0
    }
  }
  return(jump)
}

<小时>

4) 再一次,让我们看看当前最内层的循环:


4) Again, let's have a look at the current innermost loop:

for (j in js){ 
  k=k+1
  if (NT>0){
      jump[k,l] <- sum(Y[1:NT,l]*as.integer(U[1:NT,l]<=j)) # note the change
  }else jump[k,l]=0
}

正如你所看到的,NT 不依赖于这个循环的迭代,所以内部的 if 可以移到外面,如下:

as you can notice, NT does not depend on the iteration of this loop, so the inner if can be moved outside, as follows:

if (NT>0){
  for (j in js){ 
    k=k+1
    jump[k,l] <- sum(Y[1:NT,l]*as.integer(U[1:NT,l]<=j)) # note the change
  }
}else{
  for (j in js){ 
    k=k+1
    jump[k,l]=0
  }
}

这似乎比以前更糟,是的,但现在这两个条件可以变成单行的(注意 sapply¹ 的使用):

this seems worse than before, well yes it is, but now the 2 conditions can be turned into one-liner's (note the use of sapply¹):

if (NT>0){
  jump[1:length(js),l] <- sapply(js,function(j){ sum(Y[1:NT,l]*as.integer(U[1:NT,l]<=j)) })
}else{
  jump[1:length(js),l] <- 0
}

获得如下跳转函数:

jump <- function(t=0,T=T,par){
  jump <- matrix(0,T/dt+1,n) 
  U <- replicate(n,runif(100,t,T)) 
  Y <-replicate(n,rexp(100,1/par[6])) 
  NTs <- rpois(n,par[5]*T) 
  js <- seq(t,T,dt) 
  for (l in 1:n){ 
    NT <- NTs[l] 
    if (NT>0){
      jump[1:length(js),l] <- sapply(js,function(j){ sum(Y[1:NT,l]*as.integer(U[1:NT,l]<=j)) })
    }else{
      jump[1:length(js),l] <- 0
    }
  }
  return(jump)
}

<小时>

5) 最后我们可以摆脱最后一个循环,再次使用 sapply¹ 函数,获得最终的 jump 函数:


5) finally we can get rid of the last loop, using again the sapply¹ function, obtaining the final jump function :

jump <- function(t=0,T=T,par){
  U <- replicate(n,runif(100,t,T)) 
  Y <-replicate(n,rexp(100,1/par[6])) 
  js <- seq(t,T,dt)
  NTs <- rpois(n,par[5]*T)

  jump <- sapply(1:n,function(l){
    NT <- NTs[l] 
    if (NT>0){
      sapply(js,function(j){ sum(Y[1:NT,l]*as.integer(U[1:NT,l]<=j)) })
    }else {
      rep(0,length(js))
    }
  })
  return(jump)
}

<小时>

(¹)

sapply 功能非常好用.对于在 X 参数中传递的列表或向量的每个元素,它应用在 FUN 参数中传递的函数,例如:

sapply function is pretty easy to use. For each element of the list or vector passed in the X parameter, it applies the function passed in the FUN parameter, e.g. :

vect <- 1:3
sapply(X=vect,FUN=function(el){el+10}
# [1] 11 12 13

由于默认情况下 simplify 参数为真,结果被强制转换为最简单的对象.因此,例如在前面的例子中结果变成了一个向量,而在下面的例子中结果变成了一个矩阵(因为对于每个元素我们返回一个相同大小的向量):

since by default the simplify parameter is true, the result is coerced to the simplest possible object. So, for example in the previous case the result becomes a vector, while in the following example result become a matrix (since for each element we return a vector of the same size) :

vect <- 1:3
sapply(X=vect,FUN=function(el){rep(el,5)})

#      [,1] [,2] [,3]
# [1,]    1    2    3
# [2,]    1    2    3
# [3,]    1    2    3
# [4,]    1    2    3
# [5,]    1    2    3

<小时>

基准:

以下基准测试只是让您了解速度增益,但实际性能可能会因您的输入参数而异.
可以想象,jump_old 对应于你原来的函数 1,而 jump_new 是最终的向量化版本.

The following benchmark just give you an idea of the speed gain, but the actual performances may be different depending on your input parameters.
As you can imagine, jump_old corresponds to your original function 1, while jump_new is the final vectorized version.

# let's use some random parameters
n = 10
m = 3
T = 13
par = c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6)
dt <- 3

set.seed(123)
system.time(for(i in 1:5000) old <- jump_old(T=T,par=par))
# user   system  elapsed 
# 12.39    0.00    12.41

set.seed(123)
system.time(for(i in 1:5000) new <- jump_new(T=T,par=par))
# user  system  elapsed 
# 4.49    0.00     4.53 

# check if last results of the 2 functions are the same:
isTRUE(all.equal(old,new))
# [1] TRUE

这篇关于使用应用函数重写循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆