通过多种条件删除重复项 [英] Remove duplicates by multiple conditions

查看：133 发布时间：2020/10/26 5:31:59 r dplyr tidyr tidyverse

本文介绍了通过多种条件删除重复项的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个数据，其中一个（名称）在卵期类别中多次出现。我希望每个人只有一个样本，但我不只是想保留R发现的第一个样本。我想保持该小组在所有其他类别中出现最多的那个。希望我的示例可以帮助您弄清楚这一点。

I have data where an individual (Name) appears multiple times in a eggphase category. I would like for there only to be one sample per individual but I don't just want to keep the first one the R finds. I would like to keep the one where the group appears most in all other categories. Hopefully my example helps make this clear.

library(tidyverse)
myDF <- read.table(text="Tissue Food Eggphase Name Group
  wb fl after Kia a
  wb fl after Kia c
  wb wf before Kia b
  wb fl before Lucy c
  wb fl after Lucy b
  wb fl after Lucy c
  wb fl yolkdep Jess c
  wb fl yolkdep Betty a
  wb fl yolkdep Betty b", header = TRUE)

我只想保留曾经按组织，食物和蛋相分组的名称出现的行，但我想选择组所在的行

I would like to just keep the rows where Name appears once grouped by Tissue, Food and Eggphase BUT I want to select the row where Group appears in most if not all different eggphases (with the same Tissue and Food combinations).

   #results I want
  Tissue Food Eggphase  Name Group
1     wb   fl    after   Kia     c
2     wb   wf   before   Kia     b
3     wb   fl   before  Lucy     c
4     wb   fl    after  Lucy     c
5     wb   fl  yolkdep  Jess     c
6     wb   fl  yolkdep Betty     b

我尝试过

one_bird <- myDF %>% 
  distinct(Tissue, Food, Eggphase, Name, .keep_all = TRUE)

，但仅保留第一个条目

  Tissue Food Eggphase  Name Group
1     wb   fl    after   Kia     a
2     wb   wf   before   Kia     b
3     wb   fl   before  Lucy     c
4     wb   fl    after  Lucy     b
5     wb   fl  yolkdep  Jess     c
6     wb   fl  yolkdep Betty     b

关于如何分辨它的任何想法，请选择行组出现在组织 食物组合？在我的示例中，出现在<$ c $组织的组织和食物组合中最多的组c> wb和 fl 是 c 和 b ，但起亚没有出现在 Group b ，因此 c 是更好的选择。像这个例子一样，我的数据中有重复的数据，这些重复数据不是最常见的 Group 组中的数据，我如何使其仅针对该行选择次最常见的数据？


Any ideas in how to tell it select the row where Groupappears in most (if not all) of the eggphases within a Tissue Food combination? 
In my example the group that appears the most within the Tissue and Food combination of wb and fl is c and b but Kia doesn't appear in Group b and so c is a better option. Like this example, my data has duplicates which are from groups which are not the most common Group, how do I make it choose next most common just for that row? 
我希望我已经足够理解了。
I hope I have made enough sense.
推荐答案
一个选项将创建一个按组织，食物，组分组的频次列，然后对 n进行降序排列并使用不同 
One option would be to create a frequency column grouped by 'Tissue', 'Food', 'Group', and then do a descending arrange on 'n' and use distinct
library(dplyr)
myDF %>%
     group_by(Tissue, Food, Group) %>%
     mutate(n = n()) %>% arrange(Tissue, Food, Eggphase, Name, desc(n)) %>% 
     ungroup %>%
     distinct(Tissue, Food, Eggphase, Name, .keep_all = TRUE) %>%
     select(-n)


                        这篇关于通过多种条件删除重复项的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

通过多种条件删除重复项 [英] Remove duplicates by multiple conditions

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

通过多种条件删除重复项 [英] Remove duplicates by multiple conditions

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭