具有数值阈值的两个数值向量上的全部到全部setdiff，用于接受匹配 [英] All-to-all setdiff on two numeric vectors with a numeric threshold for accepting matches

查看：101 发布时间：2020/10/6 18:48:29 r vector compare set-difference

本文介绍了具有数值阈值的两个数值向量上的全部到全部setdiff，用于接受匹配的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想做的或多或少是以下两个线程中讨论的问题的组合：

What I want to do is more or less a combination of the problems discussed in the two following threads:

在两个之间执行非成对的all-all-all比较无序字符向量---相交的反面---全部到全部setdiff

在选定阈值内基于数字行名合并数据帧，并保持不匹配的行

Perform non-pairwise all-to-all comparisons between two unordered character vectors --- The opposite of intersect --- all-to-all setdiff
Merge data frames based on numeric rownames within a chosen threshold and keeping unmatched rows as well

我有两个数值向量：

b_1 <- c(543.4591, 489.36325, 12.03, 896.158, 1002.5698, 301.569)
b_2 <- c(22.12, 53, 12.02, 543.4891, 5666.31, 100.1, 896.131, 489.37)

我想比较 b_1中的所有元素对 b_2 中的所有元素，反之亦然。

I want to compare all elements in b_1 against all elements in b_2 and vice versa.

如果 b_1 中的 element_i 是否等于范围内的任何数字 b_2中的 element_j±0.045 / code>，则必须报告 element_i 。


If element_i in b_1 is NOT equal to any number in the range element_j ± 0.045 in b_2 then element_i must be reported.
如果 element_j，则同样 b_2 中的 不等于范围内的任何数字  b_1 中的 element_i±0.045 ，则 element_j 必须为
Likewise, if element_j in b_2 is NOT equal to any number in the range element_i ± 0.045 in b_1 then element_j must be reported.
因此，基于上面提供的向量的示例答案将为：
Therefore, example answer based on the vectors provided above will be:
### based on threshold = 0.045
in_b1_not_in_b2 <- c(1002.5698, 301.569)
in_b2_not_in_b1 <- c(22.12, 53, 5666.31, 100.1)

是否有R函数可以做到这一点？
Is there an R function that would do this?
推荐答案
如果您很乐意使用非 base 包，请 data.table :: inrange 是方便的功能
If you are happy to use a non-base package, data.table::inrange is a convenient function.
x1[!inrange(x1, x2 - 0.045, x2 + 0.045)]
# [1] 1002.570  301.569

x2[!inrange(x2, x1 - 0.045, x1 + 0.045)]
# [1]   22.12   53.00 5666.31  100.10

 
 
 
 
 
   inrange 在较大的数据集上也很有效。在例如 1e5 向量，范围是>比其他两个备选方案快700 倍：




inrange is also efficient on larger data sets. On e.g. 1e5 vectors, inrange is > 700 times faster than the two other alternatives:
n <- 1e5
b1 <- runif(n, 0, 10000)
b2 <- b1 + runif(n, -1, 1)

microbenchmark(
  f1 = f(b1, b2, 0.045, 5000),
  f2 = list(in_b1_not_in_b2 = b1[sapply(b1, function(x) !any(abs(x - b2) <= 0.045))],
       in_b2_not_in_b1 = b2[sapply(b2, function(x) !any(abs(x - b1) <= 0.045))]),
  f3 = list(in_b1_not_in_b2 = b1[!inrange(b1, b2 - 0.045, b2 + 0.045)],
       in_b2_not_in_b1 = b2[!inrange(b2, b1 - 0.045, b1 + 0.045)]),
  unit = "relative", times = 10)
# Unit: relative
#  expr      min       lq     mean   median        uq       max neval
#    f1 1976.931 1481.324 1269.393 1103.567 1173.3017 1060.2435    10
#    f2 1347.114 1027.682  858.908  766.773  754.7606  700.0702    10
#    f3    1.000    1.000    1.000    1.000    1.0000    1.0000    10

 
 
 
 
 
 是的，他们给出相同的结果[R结果：




And yes, they give the same result:
n <- 100
b1 <- runif(n, 0, 10000)
b2 <- b1 + runif(n, -1, 1)

all.equal(f(b1, b2, 0.045, 5000),
          list(in_b1_not_in_b2 = b1[sapply(b1, function(x) !any(abs(x - b2) <= 0.045))],
               in_b2_not_in_b1 = b2[sapply(b2, function(x) !any(abs(x - b1) <= 0.045))]))
# TRUE

all.equal(f(b1, b2, 0.045, 5000),
          list(in_b1_not_in_b2 = b1[!inrange(b1, b2 - 0.045, b2 + 0.045)],
               in_b2_not_in_b1 = b2[!inrange(b2, b1 - 0.045, b1 + 0.045)]))
# TRUE

 
 
 
 
 
  在SO上搜索范围 。

                        这篇关于具有数值阈值的两个数值向量上的全部到全部setdiff，用于接受匹配的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

具有数值阈值的两个数值向量上的全部到全部setdiff，用于接受匹配 [英] All-to-all setdiff on two numeric vectors with a numeric threshold for accepting matches

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

具有数值阈值的两个数值向量上的全部到全部setdiff，用于接受匹配 [英] All-to-all setdiff on two numeric vectors with a numeric threshold for accepting matches

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭