R-比较2个矩阵以找出两个行中都不存在的行 [英] R - Compare 2 matrices to find rows which rows aren't in both
问题描述
我有两个大小不同的R的大矩阵,分别为371 x 1502(A)和371 x 1207(B).
I have two large matrices in R of differing sizes, 371 x 1502 (A) and 371 x 1207 (B).
矩阵B的所有元素都包含在A中.A还包含许多其他行混合在一起.我正在寻找一种创建新矩阵C的方法,该矩阵包含A中所有在B中找不到的行.
All of matrix B is included in A. A also contains many other rows mixed in. I am looking for a way to create a new matrix, C, which contains all the rows in A not found in B.
我敢肯定有一种使用data.tables和key来做到这一点的方法,但是我一辈子都无法弄清楚.
I am sure there is a way to do this using data.tables and keys but I can't for the life of me figure it out.
示例数据:
a = t(matrix(c(1,2,3,4,5,6,7,8,9), nrow = 3))
b = t(matrix(c(1,2,3,7,8,9), nrow = 3))
感谢您的帮助,
谢谢.
推荐答案
我会在基本R中执行此操作:
I would do it in base R:
a[!duplicated(rbind(b,a))[(nrow(b)+1):(nrow(a)+nrow(b))], ]
...但是data.table解决方案可能更优雅和/或更快捷.
... But a data.table solution might be more elegant and/or quicker.
感谢@thelatemail,这是data.table
版本:
Thanks to @thelatemail, here is the data.table
version:
a[!b, on=names(a)]
这是迄今为止和对于这个问题,另一个答案中提出的plyr
解决方案是最快的.两种data.table
解决方案都紧随其后,而基本的R
版本要慢得多.
For this problem, the plyr
solution proposed in the other answer is the fastest. The two data.table
solutions are trailing it closely, and the base R
version is much slower.
Unit: milliseconds
expr min lq mean median uq max neval cld
BASE_R 1125.05968 1412.13170 1555.82674 1577.81665 1703.3674 1927.1632 100 c
DATA.TABLE 54.68581 83.99182 117.90571 91.86808 123.8300 318.3788 100 b
DATA.TABLE2 58.44053 86.90981 127.11152 97.39086 138.8306 328.1396 100 b
PLYR 30.87235 49.32260 61.02968 53.66639 59.6925 278.6965 100 a
这篇关于R-比较2个矩阵以找出两个行中都不存在的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!