基于重复行条件的R - 子集列 [英] R - subset column based on condition on duplicate rows

查看：110 发布时间：2017/7/21 0:07:57 r duplicates subset

本文介绍了基于重复行条件的R - 子集列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个数据框与一个id列重复，站点数。我想知道如何才能在Site_Count记录超过0时删除重复的ID记录。

生成DF：

  DF<  -  data.frame（
'ID'= sample（100：300，100，replace = T），
'Site_count'=样本（0：1，100，替换= T）
）

子集：

 子集（DF [！duplicateated（DF $ ID），]，site_count> 0）

但是在这种情况下，它会删除所有0个网站的计数 - 我想要子集，只有当有重复时才删除记录记录超过0个站点数。

期望的结果将如下所示（注意网站ID为0，但没有重复的ID为0，另一个站点数）：

  ID站点数
  -  ---------- 
 1 0 
 2 1 
 3 1 
 4 0 
 5 5

解决方案

预期输出不是很清楚。可以这样帮助：

  indx<  - 与（DF，ave（！Site_count，ID，FUN = function（x） sum（x）> 1））
 DF [！（重复（DF $ ID）& indx），]

更新

重新阅读说明后，您的预期答案也可能是：

（DF，ave（Site_count，ID，FUN = function（x）any（x> 0）））
DF [！（duplicate DF $ ID）& indx），]

I have a dataframe with an id column that is repeated, with site counts. I want to know how I can remove the duplicates ID records only when Site_Count record is more than 0.

Generate DF:

DF <- data.frame(
    'ID' = sample(100:300, 100, replace=T),
    'Site_count' = sample(0:1, 100, replace=T)
)

My attempt at the subset:

subset(DF[!duplicated(DF$ID),], site_count > 0)

But in this case it will remove all 0 site counts - I want to subset to only remove the record when there is a duplicate record with more than 0 site count.

Desirable results would look something like this (notice there site IDs with 0 site count, but no duplicate IDs with 0 and another site count):

ID    site count
--    ----------
1        0
2        1
3        1
4        0
5        5

解决方案

The expected output is not very clear. May be this helps:

 indx <- with(DF, ave(!Site_count, ID, FUN=function(x) sum(x)>1))
 DF[!(duplicated(DF$ID) & indx),]

Update

After re-reading the description, your expected answer could also be:

 indx <- with(DF, ave(Site_count, ID, FUN=function(x) any(x>0)))
 DF[!(duplicated(DF$ID) & indx),]

这篇关于基于重复行条件的R - 子集列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

基于重复行条件的R - 子集列 [英] R - subset column based on condition on duplicate rows

问题描述

更新

Update

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

基于重复行条件的R - 子集列 [英] R - subset column based on condition on duplicate rows

问题描述

更新

Update

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭