合并数据框中的重复行 [英] Consolidating duplicate rows in a dataframe

查看：106 发布时间：2020/6/2 20:32:51 r dataframe aggregate

本文介绍了合并数据框中的重复行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这是我问的过去问题的延续。基本上，我有一个数据框， df

This is a continuation of a past question I asked. Basically, I have a dataframe, df

         Beginning1 Protein2    Protein3    Protein4    Biomarker1
Pathway3    A         G           NA           NA           F
Pathway6    A         G           NA           NA           E
Pathway2    A         B           H            NA           F
Pathway5    A         B           H            NA           E
Pathway1    A         D           K            NA           F
Pathway7    A         B           C            D            F
Pathway4    A         B           C            D            E

现在我要合并行，使其看起来像这样：

And now I want to consolidate the rows to look like this:

dfnew 
         Beginning1 Protein2    Protein3    Protein4    Biomarker1
Pathway3    A         G           NA           NA           F, E
Pathway2    A         B           H            NA           F, E
Pathway7    A         D           K            NA           F    
Pathway1    A         B           C            D            F, E

我见过很多人将相同的行合并在一起在使用聚合的数据帧中，但我似乎无法使该函数在非数字值上工作。我所看到的最接近的问题是这样解决的： df1< -aggregate（df [7]，df [-7]，唯一）可以在这里找到：< a href = https://stackoverflow.com/questions/14262741/combining-duplicated-rows-in-r-and-adding-new-column- contains-ids-of-duplicate>在R中合并重复的行并添加

I've seen a lot of people consolidate identical rows in dataframes using aggregate, but I can't seem to get that function to work on non-numerical values. The closest question I have seen solved it like this: df1 <- aggregate(df[7], df[-7], unique) and can be found here: Combining duplicated rows in R and adding new column containing IDs of duplicates.

此外，并非每个路径都有匹配的对，如路径1所示。

Also, not every pathway has a matching pair, as can be seen in pathway 1.

非常感谢您的帮助！

推荐答案

使用‹dplyr的以下解决方案›和‹tidyr›软件包应该按照您想要的方式进行操作：

The following solution using the ‹dplyr› and ‹tidyr› packages should do what you want:

df %>%
    group_by(Protein2, Protein3, Protein4) %>%
    nest() %>%
    mutate(Biomarker1 = lapply(data, `[[`, 'Biomarker1'),
           Biomarker1 = unlist(lapply(Biomarker1, paste, collapse = ', '))) %>%
    ungroup() %>%
    # Restore the "Beginning1" column is a bit of work, unfortunately.
    mutate(Beginning1 = lapply(data, `[[`, 'Beginning1'),
           Beginning1 = unlist(lapply(Beginning1, `[[`, 1))) %>%
    select(-data)

这篇关于合并数据框中的重复行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

合并数据框中的重复行 [英] Consolidating duplicate rows in a dataframe

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

合并数据框中的重复行 [英] Consolidating duplicate rows in a dataframe

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭