在R数据表中获取每组的前k个记录,其中k根据组不同 [英] Get top k records per group, where k differs by group, in R data.table
问题描述
我有两个 data.table
s:
-
-
group 到
组的
k
>
- Values to extract the top
k
from, pergroup
. - A mapping from
group
to thek
values to select for thatgroup
.
如何按组查找前N个值在R data.frame中的category(groupwise)中,当 k
不会因组而异时,处理此问题。我如何做到这一点?以下是示例数据和所需结果:
how to find the top N values by group or within category (groupwise) in an R data.frame addresses this question when k
does not vary by group. How can I do this? Here's sample data and the desired result:
值:
(dt <- data.table(id=1:10,
group=c(rep(1, 5), rep(2, 5))))
# id group
# 1: 1 1
# 2: 2 1
# 3: 3 1
# 4: 4 1
# 5: 5 1
# 6: 6 2
# 7: 7 2
# 8: 8 2
# 9: 9 2
# 10: 10 2
从组映射
到 k
:
(group.k <- data.table(group=1:2,
k=2:3))
# group k
# 1: 1 2
# 2: 2 3
所需的结果
,其中应包括 group
1的前两个记录,以及 group
2:
Desired result
, which should include the first two records from group
1 and the first three records from group
2:
(result <- data.table(id=c(1:2, 6:8),
group=c(rep(1, 2), rep(2, 3))))
# id group
# 1: 1 1
# 2: 2 1
# 3: 6 2
# 4: 7 2
# 5: 8 2
在合并后将解决方案应用于上述链接的问题会返回此错误:
Applying the solution to the above-linked question after merging returns this error:
merged <- merge(dt, group.k, by="group")
(result <- merged[, head(.SD, k), by=group])
# Error: length(n) == 1L is not TRUE
推荐答案
我宁愿这样做:
dt[group.k, head(.SD, k), by=.EACHI, on="group"]
,因为它很清楚看到预期的操作是什么。当然, j
可以是 .SD [1:k]
。
because it's quite clear to see what the intended operation is. j
can be .SD[1:k]
of course. Both these expressions will very likely be (further) optimised (for speed) in the next release.
请参阅这篇文章,了解 by = .EACHI
的详细解释,直到我们包装这些小插曲。
See this post for a detailed explanation of by=.EACHI
until we wrap those vignettes.
这篇关于在R数据表中获取每组的前k个记录,其中k根据组不同的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!