r按行比较两个data.tables [英] r compare two data.tables by row

查看：122 发布时间：2020/10/15 21:06:41 r data.table

本文介绍了r按行比较两个data.tables的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有两个要比较的数据表。

I have two data.tables that I want to compare.

但是不知道为什么会有警告

But don't know why there is a warning

DT1 <- data.table(ID=c("F","A","E","B","C","D","C"),
                  num=c(59,3,108,11,22,54,241),
                  value=c(90,47,189,38,42,86,280),
                  Mark=c("Mary","Tom","Abner","Norman","Joanne",
                  "Bonnie","Trista"))

DT2 <- data.table(Mark=c("Mary","Abner","Bonnie","Trista","Norman"),
                  numA=c(48,20,88,237,20),
                  numB=c(60,326,54,268,89),
                  valueA=c(78,34,78,270,59),
                  valueB=c(90,190,90,385,75))

DToutput <- DT1[(num > DT2$numA & num < DT2$numB &
                value > DT2$valueA & value < DT2$valueB)]

我的目标：

我想找到 num 和 value 基于 Mark DT1 中的$ c>，范围为 numA 和 numB DT2 中的c $ c>。

I want to find num and value based on Mark in DT1, and there is a range of numA and numB in DT2.

例如：

对于 DT1 中的行 F ， num = 59 和 value = 90 ，并且 Mark = Mary 。因此，当使用 by = Mary 时，还必须匹配：

For row F in DT1, num = 59 and value = 90, and Mark = "Mary". So, when using by=Mary, you must also match:

num（59 ）> DT2 $ numA（48）& num（59）< DT2 $ numB（60）&值（90） DT2 $ valueA（78）&值（90）< DT2 $ valueB（90）

您可以看到 90< 90 不成立，因此我的结果将没有行 F

You can see that 90 < 90 does not hold, so my result will not have row F

此警告：

Warning messages:
 1: In num > DT2$numA : longer object length is not a multiple of shorter object lengt
 2: In num < DT2$numB : longer object length is not a multiple of shorter object lengt
 3: In value > DT2$valueA : longer object length is not a multiple of shorter object lengt
 4: In value < DT2$valueB : longer object length is not a multiple of shorter object lengt

如何将其修改为完成我想做的事？

How can I modify it to complete what I want to do?

谢谢

添加：DT2中可以使用多个相同的标记，但是值不在同一范围内。这会影响比较吗？

Added: Multiple identical Marks may be used in DT2, but the values are not the same range. Does this affect the comparison?

推荐答案

这通常是您要查找的内容吗？我加入了数据表，并使用之间进行了过滤。如果这不是您要查找的内容，是否可以发布预期输出的数据表？

Is this generally what you are looking for? I joined the datatables and filtered using between for your conditions. If this is not what you are looking for, can you post a datatable of your expected output?

library(data.table)

DT1[DT2, on = "Mark"][between(num, numA, numB, incbounds = F) & between(value, valueA, valueB, incbounds = F)]

   ID num value   Mark numA numB valueA valueB
1:  E 108   189  Abner   20  326     34    190
2:  C 241   280 Trista  237  268    270    385

编辑：
基准此方法与@ Chinsoon12中的非等值内部联接之间的比较表明，即使有更多数据，非等值内部联接也要快得多。这不是一个完美的基准（只是重复 data.table ），但是我仍然认为很明显，非等价内联接的效率要高得多。

EDIT: Benchmark comparison between this approach and the non-equi inner-join from @Chinsoon12 shows that the non-equi inner-join is much faster with even a little more data. It is not a perfect benchmark (just repeated the data.table), but I still think it is clear that the non-equi inner-join is much more efficient.

Unit: milliseconds
           expr      min       lq      mean    median       uq      max neval
        between 233.6378 265.4323 303.14039 301.82455 334.3225 373.2760    10
 non_equi_inner  71.6925  74.1547  96.96584  91.14375  97.6664 179.9907    10

基准代码：

DT1 <- data.table(sapply(DT1, rep, 1e3))[, c("num", "value") := lapply(.SD, as.integer), .SDcols = c("num", "value")]
DT2 <- data.table(sapply(DT2, rep, 1e3))[, c("numA", "numB", "valueA", "valueB") := lapply(.SD, as.integer), .SDcols = c("numA", "numB", "valueA", "valueB")]

microbenchmark::microbenchmark(
  between = {
    DT1[DT2, on = "Mark", allow.cartesian = T][between(num, numA, numB, incbounds = F) & between(value, valueA, valueB, incbounds = F)]

  },
  non_equi_inner = {
    DT1[DT2, on=.(Mark, num>numA, num<numB, value>valueA, value<valueB), nomatch=0L, 
        .(ID, num=x.num, value=x.value, Mark), allow.cartesian = T]
  },
  times = 10

)

这篇关于r按行比较两个data.tables的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

r按行比较两个data.tables [英] r compare two data.tables by row

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

r按行比较两个data.tables [英] r compare two data.tables by row

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭