基于重复行条件的R - 子集列 [英] R - subset column based on condition on duplicate rows
问题描述
生成DF:
DF< - data.frame(
'ID'= sample(100:300,100,replace = T),
'Site_count'=样本(0:1,100,替换= T)
)
子集:
子集(DF [!duplicateated(DF $ ID),],site_count> 0)
但是在这种情况下,它会删除所有0个网站的计数 - 我想要子集,只有当有重复时才删除记录记录超过0个站点数。
期望的结果将如下所示(注意网站ID为0,但没有重复的ID为0,另一个站点数):
ID站点数
- ----------
1 0
2 1
3 1
4 0
5 5
预期输出不是很清楚。可以这样帮助:
indx< - 与(DF,ave(!Site_count,ID,FUN = function(x) sum(x)> 1))
DF [!(重复(DF $ ID)& indx),]
更新
重新阅读说明后,您的预期答案也可能是:
(DF,ave(Site_count,ID,FUN = function(x)any(x> 0)))
DF [!(duplicate DF $ ID)& indx),]
I have a dataframe with an id column that is repeated, with site counts. I want to know how I can remove the duplicates ID records only when Site_Count record is more than 0.
Generate DF:
DF <- data.frame(
'ID' = sample(100:300, 100, replace=T),
'Site_count' = sample(0:1, 100, replace=T)
)
My attempt at the subset:
subset(DF[!duplicated(DF$ID),], site_count > 0)
But in this case it will remove all 0 site counts - I want to subset to only remove the record when there is a duplicate record with more than 0 site count.
Desirable results would look something like this (notice there site IDs with 0 site count, but no duplicate IDs with 0 and another site count):
ID site count
-- ----------
1 0
2 1
3 1
4 0
5 5
The expected output is not very clear. May be this helps:
indx <- with(DF, ave(!Site_count, ID, FUN=function(x) sum(x)>1))
DF[!(duplicated(DF$ID) & indx),]
Update
After re-reading the description, your expected answer could also be:
indx <- with(DF, ave(Site_count, ID, FUN=function(x) any(x>0)))
DF[!(duplicated(DF$ID) & indx),]
这篇关于基于重复行条件的R - 子集列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!