在向量中查找第一个 TRUE 值的更快方法 [英] Faster way to find the first TRUE value in a vector
问题描述
在一个函数中,我经常需要使用如下代码:
In one function I very often need to use code like:
which(x==1)[1]
which(x>1)[1]
x[x>10][1]
其中 x
是一个数值向量.summaryRprof()
表明我在关系运算符上花费了 >80% 的时间.我想知道是否有一个函数只在达到第一个 TRUE
值之前进行比较以加速我的代码.For 循环比上面提供的选项慢.
where x
is a numeric vector. summaryRprof()
shows that I spend >80% of the time on relational operators. I wonder if there is a function that does comparison only till the first TRUE
value is reached to speed up my code. For-loop is slower than the options provided above.
推荐答案
Base R 提供了 Position
和 Find
分别用于定位第一个索引和值,对于它们谓词返回真值.这些高阶函数在第一次命中时立即返回.
Base R provides Position
and Find
for locating the first index and value, respectively, for which a predicate returns a true value. These higher-order functions return immediately upon the first hit.
f<-function(x) {
r<-vector("list",3)
r[[1]]<-which(x==1)[1]
r[[2]]<-which(x>1)[1]
r[[3]]<-x[x>10][1]
return(r)
}
p<-function(f,b) function(a) f(a,b)
g<-function(x) {
r<-vector("list",3)
r[[1]]<-Position(p(`==`,1),x)
r[[2]]<-Position(p(`>`,1),x)
r[[3]]<-Find(p(`>`,10),x)
return(r)
}
相对性能在很大程度上取决于相对于谓词成本与 Position/Find
开销的早期发现命中概率.
The relative performance depends greatly on the probability of finding a hit early relative to the cost of the predicate vs the overhead of Position/Find
.
library(microbenchmark)
set.seed(1)
x<-sample(1:100,1e5,replace=TRUE)
microbenchmark(f(x),g(x))
Unit: microseconds
expr min lq mean median uq max neval cld
f(x) 5034.283 5410.1205 6313.861 5798.4780 6948.5675 26735.52 100 b
g(x) 587.463 650.4795 1013.183 734.6375 950.9845 20285.33 100 a
y<-rep(0,1e5)
microbenchmark(f(y),g(y))
Unit: milliseconds
expr min lq mean median uq max neval cld
f(y) 3.470179 3.604831 3.791592 3.718752 3.866952 4.831073 100 a
g(y) 131.250981 133.687454 137.199230 134.846369 136.193307 177.082128 100 b
这篇关于在向量中查找第一个 TRUE 值的更快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!