从data.table中提取唯一行,每行未排序 [英] Extract unique rows from a data.table with each row unsorted
本文介绍了从data.table中提取唯一行,每行未排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
假设我有一个data.table这样:
Suppose I have a data.table like this:
表:
V1 V2
A B
C D
C A
B A
D C
我想将每一行视为一个集合,这意味着BA和AB是相同的。所以过程后,我想得到:
I want each row to be regarded as a set, which means that B A and A B are the same. So after the process, I want to get:
V1 V2
A B
C D
C A
为此,我必须首先行,然后使用 unique
删除重复的。如果我有数百万行,排序过程是相当慢。
In order to do that, I have to first sort the table row-by-row and then use unique
to remove the duplicates. The sorting process is quite slow if I have millions of rows. So is there an easy way to remove the duplicates without sorting?
推荐答案
对于两列,你可以使用下面的技巧: / p>
For just two columns you can use the following trick:
dt = data.table(a = letters[1:5], b = letters[5:1])
# a b
#1: a e
#2: b d
#3: c c
#4: d b
#5: e a
dt[dt[, .I[1], by = list(pmin(a, b), pmax(a, b))]$V1]
# a b
#1: a e
#2: b d
#3: c c
这篇关于从data.table中提取唯一行,每行未排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文