合并数据框中的重复行 [英] Consolidating duplicate rows in a dataframe
问题描述
这是我问的过去问题的延续。基本上,我有一个数据框, df
This is a continuation of a past question I asked. Basically, I have a dataframe, df
Beginning1 Protein2 Protein3 Protein4 Biomarker1
Pathway3 A G NA NA F
Pathway6 A G NA NA E
Pathway2 A B H NA F
Pathway5 A B H NA E
Pathway1 A D K NA F
Pathway7 A B C D F
Pathway4 A B C D E
现在我要合并行,使其看起来像这样:
And now I want to consolidate the rows to look like this:
dfnew
Beginning1 Protein2 Protein3 Protein4 Biomarker1
Pathway3 A G NA NA F, E
Pathway2 A B H NA F, E
Pathway7 A D K NA F
Pathway1 A B C D F, E
我见过很多人将相同的行合并在一起在使用聚合的数据帧中,但我似乎无法使该函数在非数字值上工作。我所看到的最接近的问题是这样解决的: df1< -aggregate(df [7],df [-7],唯一)
可以在这里找到:< a href = https://stackoverflow.com/questions/14262741/combining-duplicated-rows-in-r-and-adding-new-column- contains-ids-of-duplicate>在R中合并重复的行并添加
I've seen a lot of people consolidate identical rows in dataframes using aggregate, but I can't seem to get that function to work on non-numerical values. The closest question I have seen solved it like this: df1 <- aggregate(df[7], df[-7], unique)
and can be found here: Combining duplicated rows in R and adding new column containing IDs of duplicates.
此外,并非每个路径都有匹配的对,如路径1所示。
Also, not every pathway has a matching pair, as can be seen in pathway 1.
非常感谢您的帮助!
推荐答案
使用‹dplyr的以下解决方案›和‹tidyr›软件包应该按照您想要的方式进行操作:
The following solution using the ‹dplyr› and ‹tidyr› packages should do what you want:
df %>%
group_by(Protein2, Protein3, Protein4) %>%
nest() %>%
mutate(Biomarker1 = lapply(data, `[[`, 'Biomarker1'),
Biomarker1 = unlist(lapply(Biomarker1, paste, collapse = ', '))) %>%
ungroup() %>%
# Restore the "Beginning1" column is a bit of work, unfortunately.
mutate(Beginning1 = lapply(data, `[[`, 'Beginning1'),
Beginning1 = unlist(lapply(Beginning1, `[[`, 1))) %>%
select(-data)
这篇关于合并数据框中的重复行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!