在R数据表中获取每组的前k个记录，其中k根据组不同 [英] Get top k records per group, where k differs by group, in R data.table

查看：538 发布时间：2017/3/12 13:09:11 r data.table

本文介绍了在R数据表中获取每组的前k个记录，其中k根据组不同的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有两个 data.table s：

 
   group 到组的 k  >



Values to extract the top k from, per group.
A mapping from group to the k values to select for that group.

 如何按组查找前N个值在R data.frame中的category（groupwise）中，当 k 不会因组而异时，处理此问题。我如何做到这一点？以下是示例数据和所需结果：
how to find the top N values by group or within category (groupwise) in an R data.frame addresses this question when k does not vary by group. How can I do this? Here's sample data and the desired result:
值：
(dt <- data.table(id=1:10,
                  group=c(rep(1, 5), rep(2, 5))))
#     id group
#  1:  1     1
#  2:  2     1
#  3:  3     1
#  4:  4     1
#  5:  5     1
#  6:  6     2
#  7:  7     2
#  8:  8     2
#  9:  9     2
# 10: 10     2

从组映射到 k ：
(group.k <- data.table(group=1:2, 
                       k=2:3))
#    group k
# 1:     1 2
# 2:     2 3

所需的结果，其中应包括 group  1的前两个记录，以及 group  2：
Desired result, which should include the first two records from group 1 and the first three records from group 2:
(result <- data.table(id=c(1:2, 6:8),
                      group=c(rep(1, 2), rep(2, 3))))
#    id group
# 1:  1     1
# 2:  2     1
# 3:  6     2
# 4:  7     2
# 5:  8     2

在合并后将解决方案应用于上述链接的问题会返回此错误：
Applying the solution to the above-linked question after merging returns this error:
merged <- merge(dt, group.k, by="group")
(result <- merged[, head(.SD, k), by=group])
# Error: length(n) == 1L is not TRUE

 
 
推荐答案
我宁愿这样做：
dt[group.k, head(.SD, k), by=.EACHI, on="group"]

，因为它很清楚看到预期的操作是什么。当然， j 可以是 .SD [1：k] 。 
because it's quite clear to see what the intended operation is. j can be .SD[1:k] of course. Both these expressions will very likely be (further) optimised (for speed) in the next release.
请参阅这篇文章，了解 by = .EACHI 的详细解释，直到我们包装这些小插曲。
See this post for a detailed explanation of by=.EACHI until we wrap those vignettes.

                        这篇关于在R数据表中获取每组的前k个记录，其中k根据组不同的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

在R数据表中获取每组的前k个记录，其中k根据组不同 [英] Get top k records per group, where k differs by group, in R data.table

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在R数据表中获取每组的前k个记录，其中k根据组不同 [英] Get top k records per group, where k differs by group, in R data.table

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭