使用原始文件查找重复的行 [英] Find duplicated rows with original

查看：92 发布时间：2017/3/12 11:15:40 r data.table

本文介绍了使用原始文件查找重复的行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我可以在 data.table dt 中获得 R code>使用


I can get duplicated rows in R on a data.table dt using
dt[duplicated(dt, by=someColumns)] 

但是，我想得到重复的行和非重复的对，例如考虑 dt ：
However, I would like to get pairs of duplicated rows and the "non-duplicates", for example consider dt:
col1, col2, col3 
   A     B    C1
   A     B    C2
   A    B1    C1

现在， dt [重复（dt，by = c（'col1'，col2））会给我一些符合
col1, col2, col3
   A     B    C2

没有选择重复，也就是
col1, col2, col3 
   A     B    C1
   A     B    C2

 答案速度比较：
> system.time(dt[duplicated(dt2, by = t) | duplicated(dt, by = t, fromLast = TRUE)])
   user  system elapsed 
  0.008   0.000   0.009 
> system.time(dt[, .SD[.N > 1], by = t])
   user  system elapsed 
 77.555   0.100  77.703 

 
 
推荐答案
我相信这本质上是一个重复的

I believe this is essentially a duplicate of this question, though i can see how you may not have found it...
 ...这是一个基于引用问题中概述的逻辑的回答：
...here's an answer building off the logic outlined in the referenced question:
dt <- read.table(text = "col1 col2 col3 
   A     B    C1
   A     B    C2
   A    B1    C1", header = TRUE, stringsAsFactors = FALSE)


idx <- duplicated(dt[, 1:2]) | duplicated(dt[, 1:2], fromLast = TRUE)

dt[idx, ]
#---
  col1 col2 col3
1    A    B   C1
2    A    B   C2

由于您使用 data.table ，这可能是你想要的：

Since you are using data.table, this is probably what you want:
library(data.table)
dt <- data.table(dt)
dt[duplicated(dt, by = c("col1", "col2")) | duplicated(dt, by = c("col1", "col2"), fromLast = TRUE)]
#---
   col1 col2 col3
1:    A    B   C1
2:    A    B   C2


                        这篇关于使用原始文件查找重复的行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

使用原始文件查找重复的行 [英] Find duplicated rows with original

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用原始文件查找重复的行 [英] Find duplicated rows with original

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭