查找一个向量中的元素数量少于另一个向量中的元素数量 [英] Finding number of elements in one vector that are less than an element in another vector

查看:162
本文介绍了查找一个向量中的元素数量少于另一个向量中的元素数量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

说我们有两个向量

a <- c(1, 2, 2, 4, 7)
b <- c(1, 2, 3, 5, 7)

对于b中的每个元素b[i],我想找到a中的元素数小于b[i]的元素,或者等效地,我想知道c(b[i], a)中b_i的排名.

我可以想到几种幼稚的方法,例如进行以下length(b)次以下操作之一:

min_rank(c(b[i], a))
sum(a < b[i])

如果length(a) = length(b) = N且N大的话,最好的方法是什么?

为澄清起见,我想知道是否有一种更有效的计算方法来执行此操作,也就是说,在这种情况下,我是否可以做得比二次时间还要好.

矢量化总是很酷;),谢谢@Henrik!

运行时间

a <- rpois(100000, 20)
b <- rpois(100000, 10)

system.time(
  result1 <- sapply(b, function(x) sum(a < x))
)
# user  system elapsed 
# 71.15    0.00   71.16

sw <- proc.time()
  bu <- sort(unique(b))
  ab <- sort(c(a, bu))
  ind <- match(bu, ab)
  nbelow <- ind - 1:length(bu)
  result2 <- sapply(b, function(x) nbelow[match(x, bu)])
proc.time() - sw

# user  system elapsed 
# 0.46    0.00    0.48 

sw <- proc.time()
  a1 <- sort(a)
  result3 <- findInterval(b - sqrt(.Machine$double.eps), a1)
proc.time() - sw

# user  system elapsed 
# 0.00    0.00    0.03 

identical(result1, result2) && identical(result2, result3)
# [1] TRUE

解决方案

假设a的排序越来越弱,请使用findInterval:

a <- sort(a)
## gives points less than or equal to b[i]
findInterval(b, a)
# [1] 1 3 3 4 5
## to do strictly less than, subtract a small bit from b
## uses .Machine$double.eps (the smallest distinguishable difference)
findInterval(b - sqrt(.Machine$double.eps), a)
# [1] 0 1 3 4 4

Say we have a couple vectors

a <- c(1, 2, 2, 4, 7)
b <- c(1, 2, 3, 5, 7)

For each element b[i] in b I want find the number of elements in a that's less than b[i], or, equivalent, I want to know the rank of b_i in c(b[i], a).

there are a couple naive ways I can think of, e.g. doing either of the following length(b) times:

min_rank(c(b[i], a))
sum(a < b[i])

What's the best way to do this if length(a) = length(b) = N where N is large?

EDIT:

To clarify, I'm wondering if there's a more computationally efficient way to do this, i.e. if I can do better than quadratic time in this case.

Vectorization is always cool though ;), thanks @Henrik!

Running time

a <- rpois(100000, 20)
b <- rpois(100000, 10)

system.time(
  result1 <- sapply(b, function(x) sum(a < x))
)
# user  system elapsed 
# 71.15    0.00   71.16

sw <- proc.time()
  bu <- sort(unique(b))
  ab <- sort(c(a, bu))
  ind <- match(bu, ab)
  nbelow <- ind - 1:length(bu)
  result2 <- sapply(b, function(x) nbelow[match(x, bu)])
proc.time() - sw

# user  system elapsed 
# 0.46    0.00    0.48 

sw <- proc.time()
  a1 <- sort(a)
  result3 <- findInterval(b - sqrt(.Machine$double.eps), a1)
proc.time() - sw

# user  system elapsed 
# 0.00    0.00    0.03 

identical(result1, result2) && identical(result2, result3)
# [1] TRUE

解决方案

Assuming that a is weakly sorted increasingly, use findInterval:

a <- sort(a)
## gives points less than or equal to b[i]
findInterval(b, a)
# [1] 1 3 3 4 5
## to do strictly less than, subtract a small bit from b
## uses .Machine$double.eps (the smallest distinguishable difference)
findInterval(b - sqrt(.Machine$double.eps), a)
# [1] 0 1 3 4 4

这篇关于查找一个向量中的元素数量少于另一个向量中的元素数量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆