在R数据表中获取每组的前k个记录,其中k根据组不同 [英] Get top k records per group, where k differs by group, in R data.table

查看:538
本文介绍了在R数据表中获取每组的前k个记录,其中k根据组不同的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个 data.table s:



  1. group 到组的 k >
  1. Values to extract the top k from, per group.
  2. A mapping from group to the k values to select for that group.

如何按组查找前N个值在R data.frame中的category(groupwise)中,当 k 不会因组而异时,处理此问题。我如何做到这一点?以下是示例数据和所需结果:

how to find the top N values by group or within category (groupwise) in an R data.frame addresses this question when k does not vary by group. How can I do this? Here's sample data and the desired result:

值:

(dt <- data.table(id=1:10,
                  group=c(rep(1, 5), rep(2, 5))))
#     id group
#  1:  1     1
#  2:  2     1
#  3:  3     1
#  4:  4     1
#  5:  5     1
#  6:  6     2
#  7:  7     2
#  8:  8     2
#  9:  9     2
# 10: 10     2

组映射 k

(group.k <- data.table(group=1:2, 
                       k=2:3))
#    group k
# 1:     1 2
# 2:     2 3

所需的结果,其中应包括 group 1的前两个记录,以及 group 2:

Desired result, which should include the first two records from group 1 and the first three records from group 2:

(result <- data.table(id=c(1:2, 6:8),
                      group=c(rep(1, 2), rep(2, 3))))
#    id group
# 1:  1     1
# 2:  2     1
# 3:  6     2
# 4:  7     2
# 5:  8     2

在合并后将解决方案应用于上述链接的问题会返回此错误:

Applying the solution to the above-linked question after merging returns this error:

merged <- merge(dt, group.k, by="group")
(result <- merged[, head(.SD, k), by=group])
# Error: length(n) == 1L is not TRUE


推荐答案

我宁愿这样做:

dt[group.k, head(.SD, k), by=.EACHI, on="group"]

,因为它很清楚看到预期的操作是什么。当然, j 可以是 .SD [1:k]

because it's quite clear to see what the intended operation is. j can be .SD[1:k] of course. Both these expressions will very likely be (further) optimised (for speed) in the next release.

请参阅这篇文章,了解 by = .EACHI 的详细解释,直到我们包装这些小插曲。

See this post for a detailed explanation of by=.EACHI until we wrap those vignettes.

这篇关于在R数据表中获取每组的前k个记录,其中k根据组不同的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆