从data.table中提取唯一行,每行未排序 [英] Extract unique rows from a data.table with each row unsorted

查看:91
本文介绍了从data.table中提取唯一行,每行未排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个data.table这样:

Suppose I have a data.table like this:

表:

V1 V2
 A  B
 C  D
 C  A
 B  A
 D  C

我想将每一行视为一个集合,这意味着BA和AB是相同的。所以过程后,我想得到:

I want each row to be regarded as a set, which means that B A and A B are the same. So after the process, I want to get:

V1 V2
 A  B
 C  D
 C  A

为此,我必须首先,然后使用 unique 删除重复的。如果我有数百万行,排序过程是相当慢。

In order to do that, I have to first sort the table row-by-row and then use unique to remove the duplicates. The sorting process is quite slow if I have millions of rows. So is there an easy way to remove the duplicates without sorting?

推荐答案

对于两列,你可以使用下面的技巧: / p>

For just two columns you can use the following trick:

dt = data.table(a = letters[1:5], b = letters[5:1])
#   a b
#1: a e
#2: b d
#3: c c
#4: d b
#5: e a

dt[dt[, .I[1], by = list(pmin(a, b), pmax(a, b))]$V1]
#   a b
#1: a e
#2: b d
#3: c c

这篇关于从data.table中提取唯一行,每行未排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆