从数据框中提取重复的行 [英] Extracting duplicate lines from a data frame

查看:148
本文介绍了从数据框中提取重复的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个很大的数据框,我的工作,前几行如下:

I have a large data frame that Im working with, the first few lines are as follows:

      Assay   Genotype   Sample    Result
1     001        G         1         0
2     001        A         2         1
3     001        G         3         0 
4     001        NA        1         NA
5     002        T         1         0
6     002        G         2         1
7     002        T         2         0 
8     002        T         4         0
9     003        NA        1         NA

总共我将使用2000个样品和每个样品的168个测定。

In total I'll be working with 2000 samples and 168 Assays for each sample.

Id想提取我有两个相同的Assay和Sample的多个条目的行。我希望结果数据在包含所有重复条目的数据帧中,排序使得副本彼此相邻。从上面的例子中,结果将如下所示:

Id like to extract the lines where I have multiple entries with both the same Assay and Sample. I want the resulting data to be in a data frame containing all of the duplicate entries, sorted such that the duplicates are next to each other. From the example above the result would look like this:

      Assay   Genotype   Sample    Result
1     001        G         1         0
4     001        NA        1         NA
6     002        G         2         1
7     002        T         2         0 


推荐答案

演示数据以便于加载:

df <- structure(list(Assay = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L), Genotype = structure(c(2L, 1L, 2L, NA, 3L, 2L, 3L, 3L, NA), .Label = c("A", "G", "T"), class = "factor"), Sample = c(1L, 2L, 3L, 1L, 1L, 2L, 2L, 4L, 1L), Result = c(0L, 1L, 0L, NA, 0L, 1L, 0L, 0L, NA)), .Names = c("Assay", "Genotype", "Sample", "Result"), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9"))

您可以轻松获得重复的复本测定/样品对,并复制

You could easily get the dupicated Assay/Sample pairs with duplicated:

vars <- c('Assay', 'Sample')
dup <- df[duplicated(x[, vars]), vars]

导致:

> dup
  Assay Sample
4     1      1
7     2      2

需要一个简单的合并,以获得所需结果:

Which needs a simple merge for required result:

> merge(dup, df)
  Assay Sample Genotype Result
1     1      1     <NA>     NA
2     1      1        G      0
3     2      2        G      1
4     2      2        T      0

这篇关于从数据框中提取重复的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆