识别重复的行 [英] Identifying duplicated rows

查看：93 发布时间：2017/4/2 12:24:46 r dataset matching

本文介绍了识别重复的行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个较大的数据帧（约50K行和50到75列），它们具有少量的行，例如75列中的7个。虽然使用重复（...）找到重复行的行很简单，但我想能够拉出重复的行和重复的行，或者如果（从较早的帖子被盗）

  a<  -  c（rep（A，3），rep B，3），rep（C，2））
b <-C（1,1,2,4,1,1,2,2）
d < 'x'，'y'，'x'，'z'，'y'，'y'，'z'，'x'）
 df<  -  data.frame（a，b，d） 
 df 
abd 
 1 A 1 x 
 2 A 1 y 
 3 A 2 x 
 4 B 4 z 
 5 B 1 y 
 6 B 1 y 
 7 C 2 z 
 8 C 2 x

复制（df [，c（1,2）]）给出行2,6和8.行2重复行1，行6重复5和第8行根据第1列和第2列重复7条。因此，我想查看第1行和第2行，以查看第d列中的差异（如果有）。容易的8行和3列，但我的问题要大得多。总而言之，我正在寻找一种简单的方式来找到行索引，例如第1和第2,5和6行以及第7和第8行在50-75列的一个子集上，所以我可以直观地比较基于子集的重复行。

想法？

解决方案

  which（duplicateed（df [，1：2]）| duplicateed（df [，1：2]，fromLast = T ））
＃[1] 1 2 5 6 7 8

I have a larger data frame (~50K rows and 50 to 75 columns) that has a small number of row that are duplicated in, say, 7 of the 75 columns. Although it's simple enough to locate rows that duplicate rows above using duplicated(...), I want to be able to pull out the duplicated rows and the row that is duplicated, or if (stolen from an earlier post)

a <- c(rep("A", 3), rep("B", 3), rep("C",2))
b <- c(1,1,2,4,1,1,2,2)
d <- c('x','y','x','z','y','y','z','x')
df <- data.frame(a,b,d)
df
  a b d
1 A 1 x
2 A 1 y
3 A 2 x
4 B 4 z
5 B 1 y
6 B 1 y
7 C 2 z
8 C 2 x

duplicated(df[,c(1,2)]) gives me rows 2, 6, and 8. Row 2 duplicates row 1, row 6 duplicates 5, and row 8 duplicates 7 on the basis of columns 1 and 2. So I want to review rows 1 and 2 to see what the differences, if any, might be in column d. Easy enough with 8 rows and 3 columns, but my problem is much bigger.

To sum up, I'm looking for a simple way to find the row indices for, say rows 1 and 2, 5, and 6, and 7 and 8 based on a subset of the 50-75 columns, so I can visually compare the rows duplicated based on the subset.

Thoughts?

解决方案

which(duplicated(df[,1:2])|duplicated(df[,1:2],fromLast=T))
#[1] 1 2 5 6 7 8

这篇关于识别重复的行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

识别重复的行 [英] Identifying duplicated rows

问题描述

相关文章

其他数据库最新文章

热门教程

热门工具

登录关闭

识别重复的行 [英] Identifying duplicated rows

问题描述

相关文章

其他数据库最新文章

热门教程

热门工具

登录 关闭

登录关闭