在R中组合重复的行,并添加包含重复ID的新列 [英] Combining duplicated rows in R and adding new column containing IDs of duplicates
问题描述
我有一个数据框,如下所示:
Chr start stop ref alt Hom / het ID
chr1 5179574 5183384 ref Del Het 719
chr1 5179574 5184738 ref Del Het 915
chr1 5179574 5184738 ref Del Het 951
chr1 5336806 5358384 ref Del Het 376
chr1 5347979 5358384 ref Del Het 228
我想合并任何重复的行,组合最后一个ID列,以便所有的ID都在一行/列,如下所示:
Chr start stop ref alt Hom / het ID
pre>
chr1 5179574 5183384 ref Del Het 719
chr1 5179574 5184738 ref Del Het 915,951
chr1 5336806 5358384 ref Del Het 376
chr1 5347979 5358384 ref Del Het 228
我找到了删除重复项并对列进行求和的示例,但是我只想将列表中的所有ID与重复区域组合在一列中。
解决方案有些调用
aggregate()
/ p>
这是一个在列表对象中收集ID的选项:
(df1 < - aggregate(df [7],df [-7],unique))
#Chr start stop ref alt Hom.het ID
#1 chr1 5179574 5183384 ref Del Het 719
#2 chr1 5179574 5184738 ref Del Het 915,951
#3 chr1 5336806 5358384 ref Del Het 376
#4 chr1 5347979 5358384 ref Del Het 228
这里是以字符向量收集它们的:
df2< - aggregate(df [7],df [-7],
FUN = function(X)paste(unique(X),collapse =,))
比较两个选项的结果:
code> str(df1 $ ID)
#列表4
#$ 0:int 719
#$ 3:int [1:2] 915 951
# $ 7:int 376
#$ 8:int 228
str(df2 $ ID)
#chr [1:4]719915,951376228
I have a data frame that looks like this:
Chr start stop ref alt Hom/het ID chr1 5179574 5183384 ref Del Het 719 chr1 5179574 5184738 ref Del Het 915 chr1 5179574 5184738 ref Del Het 951 chr1 5336806 5358384 ref Del Het 376 chr1 5347979 5358384 ref Del Het 228
I would like to merge any duplicate rows, combining the last ID column so that all IDs are in one row/column, like this:
Chr start stop ref alt Hom/het ID chr1 5179574 5183384 ref Del Het 719 chr1 5179574 5184738 ref Del Het 915, 951 chr1 5336806 5358384 ref Del Het 376 chr1 5347979 5358384 ref Del Het 228
I have found examples of people removing duplicates and summing a column, but I just want to combine all IDs with duplicate regions in a list in a single column.
解决方案Some call to
aggregate()
should do the trick.Here's an option that collects the ID's in a list object:
(df1 <- aggregate(df[7], df[-7], unique)) # Chr start stop ref alt Hom.het ID # 1 chr1 5179574 5183384 ref Del Het 719 # 2 chr1 5179574 5184738 ref Del Het 915, 951 # 3 chr1 5336806 5358384 ref Del Het 376 # 4 chr1 5347979 5358384 ref Del Het 228
And here's one that collects them in a character vector:
df2 <- aggregate(df[7], df[-7], FUN = function(X) paste(unique(X), collapse=", "))
Comparing the results of the two options:
str(df1$ID) # List of 4 # $ 0: int 719 # $ 3: int [1:2] 915 951 # $ 7: int 376 # $ 8: int 228 str(df2$ID) # chr [1:4] "719" "915, 951" "376" "228"
这篇关于在R中组合重复的行,并添加包含重复ID的新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!