在R中组合重复的行,并添加包含重复ID的新列 [英] Combining duplicated rows in R and adding new column containing IDs of duplicates

查看:1067
本文介绍了在R中组合重复的行,并添加包含重复ID的新列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,如下所示:

  Chr start stop ref alt Hom / het ID 
chr1 5179574 5183384 ref Del Het 719
chr1 5179574 5184738 ref Del Het 915
chr1 5179574 5184738 ref Del Het 951
chr1 5336806 5358384 ref Del Het 376
chr1 5347979 5358384 ref Del Het 228

我想合并任何重复的行,组合最后一个ID列,以便所有的ID都在一行/列,如下所示:

  Chr start stop ref alt Hom / het ID 
chr1 5179574 5183384 ref Del Het 719
chr1 5179574 5184738 ref Del Het 915,951
chr1 5336806 5358384 ref Del Het 376
chr1 5347979 5358384 ref Del Het 228
pre>

我找到了删除重复项并对列进行求和的示例,但是我只想将列表中的所有ID与重复区域组合在一列中。

解决方案

有些调用 aggregate() / p>

这是一个在列表对象中收集ID的选项:

  (df1 < -  aggregate(df [7],df [-7],unique))
#Chr start stop ref alt Hom.het ID
#1 chr1 5179574 5183384 ref Del Het 719
#2 chr1 5179574 5184738 ref Del Het 915,951
#3 chr1 5336806 5358384 ref Del Het 376
#4 chr1 5347979 5358384 ref Del Het 228

这里是以字符向量收集它们的:

  df2<  -  aggregate(df [7],df [-7],
FUN = function(X)paste(unique(X),collapse =,))

比较两个选项的结果:

 code> str(df1 $ ID)
#列表4
#$ 0:int 719
#$ 3:int [1:2] 915 951
# $ 7:int 376
#$ 8:int 228

str(df2 $ ID)
#chr [1:4]719915,951376228


I have a data frame that looks like this:

Chr start   stop    ref alt Hom/het ID  
chr1    5179574 5183384 ref Del Het 719  
chr1    5179574 5184738 ref Del Het 915  
chr1    5179574 5184738 ref Del Het 951  
chr1    5336806 5358384 ref Del Het 376  
chr1    5347979 5358384 ref Del Het 228  

I would like to merge any duplicate rows, combining the last ID column so that all IDs are in one row/column, like this:

Chr start   stop    ref alt Hom/het ID  
chr1    5179574 5183384 ref Del Het 719  
chr1    5179574 5184738 ref Del Het 915, 951 
chr1    5336806 5358384 ref Del Het 376  
chr1    5347979 5358384 ref Del Het 228  

I have found examples of people removing duplicates and summing a column, but I just want to combine all IDs with duplicate regions in a list in a single column.

解决方案

Some call to aggregate() should do the trick.

Here's an option that collects the ID's in a list object:

(df1 <- aggregate(df[7], df[-7], unique))
#   Chr   start    stop ref alt Hom.het       ID
# 1 chr1 5179574 5183384 ref Del     Het      719
# 2 chr1 5179574 5184738 ref Del     Het 915, 951
# 3 chr1 5336806 5358384 ref Del     Het      376
# 4 chr1 5347979 5358384 ref Del     Het      228

And here's one that collects them in a character vector:

df2 <- aggregate(df[7], df[-7], 
                 FUN = function(X) paste(unique(X), collapse=", "))

Comparing the results of the two options:

str(df1$ID)
# List of 4
#  $ 0: int 719
#  $ 3: int [1:2] 915 951
#  $ 7: int 376
#  $ 8: int 228

str(df2$ID)
# chr [1:4] "719" "915, 951" "376" "228"

这篇关于在R中组合重复的行,并添加包含重复ID的新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆