在向量或列中查找第二(第三...)最高/最低值的最快方法 [英] Fastest way to find second (third...) highest/lowest value in vector or column

查看:38
本文介绍了在向量或列中查找第二(第三...)最高/最低值的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

R 提供了最大值和最小值,但除了对整个向量进行排序然后从该向量中选取一个值 x 之外,我没有看到一种非常快速的方法来查找顺序中的另一个值.

R offers max and min, but I do not see a really fast way to find another value in the order, apart from sorting the whole vector and then picking a value x from this vector.

例如,有没有更快的方法来获得第二高的值?

Is there a faster way to get the second highest value, for example?

推荐答案

Rfast 有一个名为 nth_element 的函数,它完全符合您的要求.

Rfast has a function called nth_element that does exactly what you ask.

另外上面讨论的基于部分排序的方法,不支持找到k个最小

Further the methods discussed above that are based on partial sort, don't support finding the k smallest values

更新 (28/FEB/21) 包套件提供了更快的实现 (topn),请参阅 https://stackoverflow.com/a/66367996/4729755, https://stackoverflow.com/a/53146559/4729755

Update (28/FEB/21) package kit offers a faster implementation (topn) see https://stackoverflow.com/a/66367996/4729755, https://stackoverflow.com/a/53146559/4729755

免责声明:处理可以通过使用 as.numeric(例如 Rfast::nth(as.numeric(1:10), 2))绕过的整数时出现问题,并将在 Rfast 的下一次更新中解决.

Disclaimer: An issue appears to occur when dealing with integers which can by bypassed by using as.numeric (e.g. Rfast::nth(as.numeric(1:10), 2)), and will be addressed in the next update of Rfast.

Rfast::nth(x, 5, descending = T)

将返回 x 的第 5 大元素,而

Will return the 5th largest element of x, while

Rfast::nth(x, 5, descending = F)

将返回 x 的第 5 个最小元素

Will return the 5th smallest element of x

以下针对最流行答案的基准.

Benchmarks below against most popular answers.

对于一万个数字:

N = 10000
x = rnorm(N)

maxN <- function(x, N=2){
    len <- length(x)
    if(N>len){
        warning('N greater than length(x).  Setting N=length(x)')
        N <- length(x)
    }
    sort(x,partial=len-N+1)[len-N+1]
}

microbenchmark::microbenchmark(
Rfast = Rfast::nth(x,5,descending = T),
maxn = maxN(x,5),
order = x[order(x, decreasing = T)[5]])

Unit: microseconds
  expr      min       lq      mean   median        uq       max neval
 Rfast  160.364  179.607  202.8024  194.575  210.1830   351.517   100
  maxN  396.419  423.360  559.2707  446.452  487.0775  4949.452   100
 order 1288.466 1343.417 1746.7627 1433.221 1500.7865 13768.148   100

对于 1百万 个数字:

N = 1e6
x = rnorm(N)

microbenchmark::microbenchmark(
Rfast = Rfast::nth(x,5,descending = T),
maxN = maxN(x,5),
order = x[order(x, decreasing = T)[5]]) 

Unit: milliseconds
  expr      min        lq      mean   median        uq       max neval
 Rfast  89.7722  93.63674  114.9893 104.6325  120.5767  204.8839   100
  maxN 150.2822 207.03922  235.3037 241.7604  259.7476  336.7051   100
 order 930.8924 968.54785 1005.5487 991.7995 1031.0290 1164.9129   100

这篇关于在向量或列中查找第二(第三...)最高/最低值的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆