R中的双循环操作(举例) [英] Double for-loop operation in R (with an example)

查看:280
本文介绍了R中的双循环操作(举例)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

$ p
$ b

  ####伪数据
nobs1 < -

4000
nobs2 < - 5000
mylon1 < - runif(nobs1,min = 0,max = 1)-76
mylat1 < - runif(nobs1,min = 0,max = 1)+37
mylon2 < - runif(nobs2,min = 0,max = 1)-76
mylat2 < - runif(nobs2,min = 0,max = 1)+37

####定义距离函数
thedistance< - 函数(lon1,lat1,lon2,lat2){
R < - 6371#地球平均半径[km] $ (lat2-lat1)
a <-sin(delta.lat2)^ 2 + cos(lat1 )* cos(lat2)* sin(delta.lon / 2)^ 2
c <-2 * asin(min(1,sqrt(a)))
d = R * c
返回(d)
}

ptm< - proc.time()

####计算位置之间的距离
#启动结果距离向量
ndistance< - nobs1 * nobs2#距离的数量
mydistance< - vector(mode =numeric,length = ndistance)

k = 1
为(我在1:n obs1){
for(j in 1:nobs2){
mydistance [k] = thedistance(mylon1 [i],mylat1 [i],mylon2 [j],mylat2 [j])
k = k + 1
}
}

proc.time() - ptm

计算时间:

 用户系统耗费
249.85 0.16 251.18

在这里,我的问题是是否还有加速双循环计算的空间。非常感谢。

解决方案

这个选项可以减少我的机器的运行时间到2秒,因为它的一部分是向量化的。



与原始解决方案直接比较如下。

测试数据:

  nobs1 < -  4000 
nobs2 < - 5000
mylon1 < - runif(nobs1,min = 0,max = 1) -76
mylat1 < - runif(nobs1,min = 0,max = 1)+37
mylon2 < - runif(nobs2,min = 0,max = 1)-76
mylat2 < - runif(nobs2,min = 0,max = 1)+37

原始解决方案:

  ####定义距离函数
thedistance< - 函数(lon1,lat1,lon2,lat2) {
R < - 6371#地球平均半径[km]
delta.lon < - (lon2 - lon1)
delta.lat < - (lat2 - lat1)$ b $ (1,b 1,...,2n),其中,b 1,b 2,...,b 1,b 2,..., sqrt(a)))
d = R * c
return(d)
}

ptm< - proc.time()

####计算位置之间的距离
#发起产生的距离向量
ndistance< - nobs1 * nobs2#距离的数量es
mydistance< - vector(mode =numeric,length = ndistance)

k = 1
for(i in 1:nobs1){
for j in 1:nobs2){
mydistance [k] = thedistance(mylon1 [i],mylat1 [i],mylon2 [j],mylat2 [j])
k = k + 1
}

$ b proc.time() - ptm
用户系统消耗
148.243 0.681 148.901

我的方法:

 #modified(vectorized)距离函数:
thedistance2 < - 函数(lon1,lat1,lon2,lat2){
R < - 6371#地球平均半径[km]
delta.lon < - (lon2 - lon1)$ (lat2-lat1)
a -sin(delta.lat2)^ 2 + cos(lat1)* cos(lat2)* sin(delta.lon / 2) ^ 2
c <-2 * asin(pmin(1,sqrt(a)))#pmin而不是min
d = R * c
return(d)
}

ptm2< - proc.time()

lst < - vector(list,length = nobs1)

for(i in seq_len(nobs1)){
lst [[i]] = thedistance2(mylon1 [i] ,mylat1 [i],mylon2,mylat2)
}

res < - unlist(lst)

proc.time() - ptm2
User System elapsed
1.988 0.331 2.319

结果是否相同?

  all.equal(mydistance,res)
#[1] TRUE


Please look at the following small working example:

#### Pseudo data
nobs1 <- 4000
nobs2 <- 5000
mylon1 <- runif(nobs1, min=0, max=1)-76
mylat1 <- runif(nobs1, min=0, max=1)+37
mylon2 <- runif(nobs2, min=0, max=1)-76
mylat2 <- runif(nobs2, min=0, max=1)+37

#### define a distance function
thedistance <- function(lon1, lat1, lon2, lat2) {
 R <- 6371 # Earth mean radius [km]
 delta.lon <- (lon2 - lon1)
 delta.lat <- (lat2 - lat1)
 a <- sin(delta.lat/2)^2 + cos(lat1) * cos(lat2) * sin(delta.lon/2)^2
 c <- 2 * asin(min(1,sqrt(a)))
 d = R * c
 return(d)
}

ptm <- proc.time()

#### Calculate distances between locations
# Initiate the resulting distance vector
ndistance <- nobs1*nobs2 # The number of distances
mydistance <- vector(mode = "numeric", length = ndistance)

k=1
for (i in 1:nobs1) {
 for (j in 1:nobs2) {
  mydistance[k] = thedistance(mylon1[i],mylat1[i],mylon2[j],mylat2[j])
  k=k+1
 }
}

proc.time() - ptm

The computation time:

  user  system elapsed 
249.85    0.16  251.18

Here, my question is whether there is still room for speeding up the double for-loop calculation. Thank you very much.

解决方案

Here's an option that decreases the runtime to ~2 seconds on my machine because part of it is vectorized.

A direct comparison with the original solution follows.

Test data:

nobs1 <- 4000
nobs2 <- 5000
mylon1 <- runif(nobs1, min=0, max=1)-76
mylat1 <- runif(nobs1, min=0, max=1)+37
mylon2 <- runif(nobs2, min=0, max=1)-76
mylat2 <- runif(nobs2, min=0, max=1)+37

Original solution:

#### define a distance function
thedistance <- function(lon1, lat1, lon2, lat2) {
  R <- 6371 # Earth mean radius [km]
  delta.lon <- (lon2 - lon1)
  delta.lat <- (lat2 - lat1)
  a <- sin(delta.lat/2)^2 + cos(lat1) * cos(lat2) * sin(delta.lon/2)^2
  c <- 2 * asin(min(1,sqrt(a)))
  d = R * c
  return(d)
}

ptm <- proc.time()

#### Calculate distances between locations
# Initiate the resulting distance vector
ndistance <- nobs1*nobs2 # The number of distances
mydistance <- vector(mode = "numeric", length = ndistance)

k=1
for (i in 1:nobs1) {
  for (j in 1:nobs2) {
    mydistance[k] = thedistance(mylon1[i],mylat1[i],mylon2[j],mylat2[j])
    k=k+1
  }
}

proc.time() - ptm
   User      System     elapsed 
148.243       0.681     148.901 

My approach:

# modified (vectorized) distance function:
thedistance2 <- function(lon1, lat1, lon2, lat2) {
  R <- 6371 # Earth mean radius [km]
  delta.lon <- (lon2 - lon1)
  delta.lat <- (lat2 - lat1)
  a <- sin(delta.lat/2)^2 + cos(lat1) * cos(lat2) * sin(delta.lon/2)^2
  c <- 2 * asin(pmin(1,sqrt(a)))   # pmin instead of min
  d = R * c
  return(d)
}

ptm2 <- proc.time()

lst <- vector("list", length = nobs1)

for (i in seq_len(nobs1)) {
    lst[[i]] = thedistance2(mylon1[i],mylat1[i],mylon2,mylat2)
}

res <- unlist(lst)

proc.time() - ptm2
   User      System     elapsed
  1.988       0.331       2.319 

Are the results all equal?

all.equal(mydistance, res)
#[1] TRUE

这篇关于R中的双循环操作(举例)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆