R data.table对组大小的过滤 [英] R data.table filtering on group size
问题描述
我试图在我的 data.table
中找到所有记录,其中在字段<中有多于一行的行,值 v em> f 。
I am trying to find all the records in my data.table
for which there is more than one row with value v in field f.
例如,我们可以使用以下数据:
For instance, we can use this data:
dt <- data.table(f1=c(1,2,3,4,5), f2=c(1,1,2,3,3))
如果在字段 f2
中查找该属性,我们将获得(注意没有(3,2)元组)
If looking for that property in field f2
, we'd get (note the absence of the (3,2) tuple)
f1 f2
1: 1 1
2: 2 1
3: 4 3
4: 5 3
我的第一个猜测是 dt [.N> 2,list(.N),by = f2]
,但是实际上保留的条目是 .N == 1
。
My first guess was dt[.N>2,list(.N),by=f2]
, but that actually keeps entries with .N==1
.
dt[.N>2,list(.N),by=f2]
f2 N
1: 1 2
2: 2 1
3: 3 2
另一个简单的猜测, dt [duplicated(dt $ f2)]
并不能解决问题,因为它可以保持
The other easy guess, dt[duplicated(dt$f2)]
, doesn't do the trick, as it keeps one of the 'duplicates' out of the results.
dt[duplicated(dt$f2)]
f1 f2
1: 2 1
2: 5 3
那怎么办我完成了吗?
编辑以添加示例
推荐答案
问题尚不清楚。根据标题,我们似乎要提取行数( .N
)大于1的所有组。
The question is not clear. Based on the title, it looks like we want to extract all groups with number of rows (.N
) greater than 1.
DT[, if(.N>1) .SD, by=f]
但是字段f中的值v
令人困惑。
But the value v in field f
is making it confusing.
这篇关于R data.table对组大小的过滤的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!