加速在 R 中的 sapply 调用中使用 which 的函数 [英] Speeding up function that uses which within a sapply call in R

查看:41
本文介绍了加速在 R 中的 sapply 调用中使用 which 的函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个向量 eg.我想知道 e 中的每个元素在 g 中较小的元素的百分比.在 R 中实现这一点的一种方法是:

I have two vector e and g. I want to know for each element in e the percentage of elements in g that are smaller. One way to implement this in R is:

set.seed(21)
e <- rnorm(1e4)
g <- rnorm(1e4)
mf <- function(p,v) {100*length(which(v<=p))/length(v)}
mf.out <- sapply(X=e, FUN=mf, v=g)

对于大的 eg,这需要很多时间来运行.如何更改或调整此代码以使其运行速度更快?

With large e or g, this takes a lot of time to run. How can I change or adapt this code to make this run faster?

注意:上面的 mf 函数基于 dismo 包中 mess 函数的代码.

Note: The mf function above is based on code from the mess function in the dismo package.

推荐答案

之所以这么慢是因为您调用了函数 length(e) 次.这对小向量没有太大影响,但 R 函数调用的开销确实开始与更大的向量相加.

The reason this is so slow is because you're calling your function length(e) times. It doesn't make a large difference for small vectors, but the overhead from R function calls really starts to add up with larger vectors.

通常,您需要将其移动到已编译的代码中,但幸运的是您可以使用 findInterval:

Normally, you would need to move this to compiled code, but luckily you can use findInterval:

set.seed(21)
e <- rnorm(1e4)
g <- rnorm(1e4)
O <- findInterval(e,sort(g))/length(g)

# Now for some timings:
f <- function(p,v) mean(v<=p)
system.time(o <- sapply(e, f, g))
#   user  system elapsed 
#   0.95    0.03    0.98
system.time(O <- findInterval(e,sort(g))/length(g))
#   user  system elapsed 
#      0       0       0 
identical(o,O)  # may be FALSE
all.equal(o,O)  # should be TRUE

# How fast is this on large vectors?
set.seed(21)
e <- rnorm(1e7)
g <- rnorm(1e7)
system.time(O <- findInterval(e,sort(g))/length(g))
#   user  system elapsed 
#  22.08    0.08   22.31

这篇关于加速在 R 中的 sapply 调用中使用 which 的函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆