R中的双循环操作(举例) [英] Double for-loop operation in R (with an example)
问题描述
$ b
####伪数据
nobs1 < - 4000
nobs2 < - 5000
mylon1 < - runif(nobs1,min = 0,max = 1)-76
mylat1 < - runif(nobs1,min = 0,max = 1)+37
mylon2 < - runif(nobs2,min = 0,max = 1)-76
mylat2 < - runif(nobs2,min = 0,max = 1)+37
####定义距离函数
thedistance< - 函数(lon1,lat1,lon2,lat2){
R < - 6371#地球平均半径[km] $ (lat2-lat1)
a <-sin(delta.lat2)^ 2 + cos(lat1 )* cos(lat2)* sin(delta.lon / 2)^ 2
c <-2 * asin(min(1,sqrt(a)))
d = R * c
返回(d)
}
ptm< - proc.time()
####计算位置之间的距离
#启动结果距离向量
ndistance< - nobs1 * nobs2#距离的数量
mydistance< - vector(mode =numeric,length = ndistance)
k = 1
为(我在1:n obs1){
for(j in 1:nobs2){
mydistance [k] = thedistance(mylon1 [i],mylat1 [i],mylon2 [j],mylat2 [j])
k = k + 1
}
}
proc.time() - ptm
计算时间:
用户系统耗费
249.85 0.16 251.18
在这里,我的问题是是否还有加速双循环计算的空间。非常感谢。
这个选项可以减少我的机器的运行时间到2秒,因为它的一部分是向量化的。
与原始解决方案直接比较如下。
测试数据:
nobs1 < - 4000
nobs2 < - 5000
mylon1 < - runif(nobs1,min = 0,max = 1) -76
mylat1 < - runif(nobs1,min = 0,max = 1)+37
mylon2 < - runif(nobs2,min = 0,max = 1)-76
mylat2 < - runif(nobs2,min = 0,max = 1)+37
原始解决方案:
####定义距离函数
$ p
thedistance< - 函数(lon1,lat1,lon2,lat2) {
R < - 6371#地球平均半径[km]
delta.lon < - (lon2 - lon1)
delta.lat < - (lat2 - lat1)$ b $ (1,b 1,...,2n),其中,b 1,b 2,...,b 1,b 2,..., sqrt(a)))
d = R * c
return(d)
}
ptm< - proc.time()
####计算位置之间的距离
#发起产生的距离向量
ndistance< - nobs1 * nobs2#距离的数量es
mydistance< - vector(mode =numeric,length = ndistance)
k = 1
for(i in 1:nobs1){
for j in 1:nobs2){
mydistance [k] = thedistance(mylon1 [i],mylat1 [i],mylon2 [j],mylat2 [j])
k = k + 1
}
$ b proc.time() - ptm
用户系统消耗
148.243 0.681 148.901
我的方法:
#modified(vectorized)距离函数:
thedistance2 < - 函数(lon1,lat1,lon2,lat2){
R < - 6371#地球平均半径[km]
delta.lon < - (lon2 - lon1)$ (lat2-lat1)
a -sin(delta.lat2)^ 2 + cos(lat1)* cos(lat2)* sin(delta.lon / 2) ^ 2
c <-2 * asin(pmin(1,sqrt(a)))#pmin而不是min
d = R * c
return(d)
}
ptm2< - proc.time()
lst < - vector(list,length = nobs1)
for(i in seq_len(nobs1)){
lst [[i]] = thedistance2(mylon1 [i] ,mylat1 [i],mylon2,mylat2)
}
res < - unlist(lst)
proc.time() - ptm2
User System elapsed
1.988 0.331 2.319
结果是否相同?
all.equal(mydistance,res)
#[1] TRUE
Please look at the following small working example:
#### Pseudo data nobs1 <- 4000 nobs2 <- 5000 mylon1 <- runif(nobs1, min=0, max=1)-76 mylat1 <- runif(nobs1, min=0, max=1)+37 mylon2 <- runif(nobs2, min=0, max=1)-76 mylat2 <- runif(nobs2, min=0, max=1)+37 #### define a distance function thedistance <- function(lon1, lat1, lon2, lat2) { R <- 6371 # Earth mean radius [km] delta.lon <- (lon2 - lon1) delta.lat <- (lat2 - lat1) a <- sin(delta.lat/2)^2 + cos(lat1) * cos(lat2) * sin(delta.lon/2)^2 c <- 2 * asin(min(1,sqrt(a))) d = R * c return(d) } ptm <- proc.time() #### Calculate distances between locations # Initiate the resulting distance vector ndistance <- nobs1*nobs2 # The number of distances mydistance <- vector(mode = "numeric", length = ndistance) k=1 for (i in 1:nobs1) { for (j in 1:nobs2) { mydistance[k] = thedistance(mylon1[i],mylat1[i],mylon2[j],mylat2[j]) k=k+1 } } proc.time() - ptm
The computation time:
user system elapsed 249.85 0.16 251.18
Here, my question is whether there is still room for speeding up the double for-loop calculation. Thank you very much.
解决方案Here's an option that decreases the runtime to ~2 seconds on my machine because part of it is vectorized.
A direct comparison with the original solution follows.
Test data:
nobs1 <- 4000 nobs2 <- 5000 mylon1 <- runif(nobs1, min=0, max=1)-76 mylat1 <- runif(nobs1, min=0, max=1)+37 mylon2 <- runif(nobs2, min=0, max=1)-76 mylat2 <- runif(nobs2, min=0, max=1)+37
Original solution:
#### define a distance function thedistance <- function(lon1, lat1, lon2, lat2) { R <- 6371 # Earth mean radius [km] delta.lon <- (lon2 - lon1) delta.lat <- (lat2 - lat1) a <- sin(delta.lat/2)^2 + cos(lat1) * cos(lat2) * sin(delta.lon/2)^2 c <- 2 * asin(min(1,sqrt(a))) d = R * c return(d) } ptm <- proc.time() #### Calculate distances between locations # Initiate the resulting distance vector ndistance <- nobs1*nobs2 # The number of distances mydistance <- vector(mode = "numeric", length = ndistance) k=1 for (i in 1:nobs1) { for (j in 1:nobs2) { mydistance[k] = thedistance(mylon1[i],mylat1[i],mylon2[j],mylat2[j]) k=k+1 } } proc.time() - ptm User System elapsed 148.243 0.681 148.901
My approach:
# modified (vectorized) distance function: thedistance2 <- function(lon1, lat1, lon2, lat2) { R <- 6371 # Earth mean radius [km] delta.lon <- (lon2 - lon1) delta.lat <- (lat2 - lat1) a <- sin(delta.lat/2)^2 + cos(lat1) * cos(lat2) * sin(delta.lon/2)^2 c <- 2 * asin(pmin(1,sqrt(a))) # pmin instead of min d = R * c return(d) } ptm2 <- proc.time() lst <- vector("list", length = nobs1) for (i in seq_len(nobs1)) { lst[[i]] = thedistance2(mylon1[i],mylat1[i],mylon2,mylat2) } res <- unlist(lst) proc.time() - ptm2 User System elapsed 1.988 0.331 2.319
Are the results all equal?
all.equal(mydistance, res) #[1] TRUE
这篇关于R中的双循环操作(举例)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!