循环遍历R中有序集的功能方法 [英] Functional way to loop over ordered set in R
问题描述
我正在尝试优化R中的算法,该算法在一组有序的值上运行,并确定是否存在将来"(比集合更远)的值比给定值低的值.
I'm trying to optimize an algorithm in R that runs over an ordered set of values and determines whether there are values 'in the future' ( further down the set ) that have a lower value than the given value.
例如:
+-------+--------------------------------+
| Value | RestOfSeriesContainsLowerValue |
+-------+--------------------------------+
| 5 | true |
| 4 | true |
| 2 | true |
| 1 | false |
| 3 | true |
| 4 | true |
| 4 | true |
| 3 | true |
| 3 | true |
| 2 | false |
| 2 | false |
| 2 | false |
| 7 | false |
| 8 | false |
| 9 | false |
| ... | ... |
+-------+--------------------------------+
局部最小值是值1和2.因此,此集合中第一项的RestOfSeriesContainsLowerValue值为true-因为在该集合的下端还有一个值(1),该值较低.
The local minima are values 1 and 2. Therefore RestOfSeriesContainsLowerValue for the first items in this set valuates to true - since there's a value (1) further down the set that has a lower value.
在1值之后-3和4值计算为true,因为新的局部最小值(值2)稍后将在集合中向下显示.
After the 1 value - the 3 and 4 values valuate to true, since the new local minimum ( value 2 ) is coming up later down the set.
我们当前正在使用一个在-伪代码上运行的for循环:
We're currently using a for loop that runs over the - in pseudo code:
for (i in set) {
if(value(i) <= min(set[,i:end]))
RestOfSeriesContainsLowerValue(i) = true
else
RestOfSeriesContainsLowerValue(i) = false
}
但是,这还不够有效.我正在寻找一种基于集合/功能的方式在R中编写此代码,但无法绕开它.我可以使用lapply
来做到这一点吗?
However this is not efficient enough. I'm looking for a set based / functional way to write this in R but cannot get my head around it. Can I use lapply
to do this?
推荐答案
使用lapply的功能性R代码中的伪代码
Your pseudo code in functional R code using lapply
f <-function(value) unlist(lapply(seq_along(value), function(i)if(value[i] <= min(value[i:length(value)]))FALSE else TRUE))
用于实现相同目标的矢量化代码
Vectorized code for achieving the same is
f1 <- function(value)value > rev(cummin(rev(value)))
根据样本大小,矢量化代码可以任意更快.对于n=100
,它的速度快大约10倍,对于1000
,它的速度快100倍,对于10000
Depending on the sample size, the vectorized code can be arbitrarily faster. For n=100
it is about 10 times faster, 100 times faster for 1000
, around 1000 times faster for 10000
value <- sample(1:100, 1000, replace = TRUE)
microbenchmark::microbenchmark(f(value), f1(value), unit="relative")
#Unit: relative
# expr min lq mean median uq max neval
# f(value) 172.3758 174.2449 124.1607 107.5502 104.8017 96.85548 100
#f1(value) 1.0000 1.0000 1.0000 1.0000 1.0000 1.00000 100
这篇关于循环遍历R中有序集的功能方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!