R-比较2个矩阵以找出两个行中都不存在的行 [英] R - Compare 2 matrices to find rows which rows aren't in both

查看:50
本文介绍了R-比较2个矩阵以找出两个行中都不存在的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个大小不同的R的大矩阵,分别为371 x 1502(A)和371 x 1207(B).

I have two large matrices in R of differing sizes, 371 x 1502 (A) and 371 x 1207 (B).

矩阵B的所有元素都包含在A中.A还包含许多其他行混合在一起.我正在寻找一种创建新矩阵C的方法,该矩阵包含A中所有在B中找不到的行.

All of matrix B is included in A. A also contains many other rows mixed in. I am looking for a way to create a new matrix, C, which contains all the rows in A not found in B.

我敢肯定有一种使用data.tables和key来做到这一点的方法,但是我一辈子都无法弄清楚.

I am sure there is a way to do this using data.tables and keys but I can't for the life of me figure it out.

示例数据:

a = t(matrix(c(1,2,3,4,5,6,7,8,9), nrow = 3))
b = t(matrix(c(1,2,3,7,8,9), nrow = 3))

感谢您的帮助,

谢谢.

推荐答案

我会在基本R中执行此操作:

I would do it in base R:

a[!duplicated(rbind(b,a))[(nrow(b)+1):(nrow(a)+nrow(b))], ]

...但是data.table解决方案可能更优雅和/或更快捷.

... But a data.table solution might be more elegant and/or quicker.

感谢@thelatemail,这是data.table版本:

Thanks to @thelatemail, here is the data.table version:

a[!b, on=names(a)]


这是迄今为止和对于这个问题,另一个答案中提出的plyr解决方案是最快的.两种data.table解决方案都紧随其后,而基本的R版本要慢得多.

For this problem, the plyr solution proposed in the other answer is the fastest. The two data.table solutions are trailing it closely, and the base R version is much slower.

    Unit: milliseconds
        expr        min         lq       mean     median        uq       max neval cld
      BASE_R 1125.05968 1412.13170 1555.82674 1577.81665 1703.3674 1927.1632   100   c
  DATA.TABLE   54.68581   83.99182  117.90571   91.86808  123.8300  318.3788   100  b 
 DATA.TABLE2   58.44053   86.90981  127.11152   97.39086  138.8306  328.1396   100  b 
        PLYR   30.87235   49.32260   61.02968   53.66639   59.6925  278.6965   100 a  

这篇关于R-比较2个矩阵以找出两个行中都不存在的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆