子组与data.table组 [英] Subset by group with data.table

查看：131 发布时间：2017/3/12 9:47:09 r data.table greatest-n-per-group

本文介绍了子组与data.table组的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

假设我有一个包含一些棒球运动员的数据表：

  library（plyr）
 library table）
 
 bdt<  -  as.data.table（baseball）

对于每个玩家（由id给出），我想找到他们玩最多游戏的年份对应的行。这在plyr中很简单：

  ddply（baseball，id，subset，g == max（g））

data.table的等效代码是什么？

我尝试：

  setkey（bdt，id）
 bdt [g == max（g）] ＃只有一行
 bdt [g == max（g），by = id]＃错误：提供'by'或'keyby'，但不提供j 
 bdt [，.SD max（g）]]＃只有一行

这适用于：

  bdt [，.SD [g == max（g）]，by = id]

但是它只比plyr快30％，表明这可能不是惯用的。

解决方案

这是快速的 data.table 方式：

  bdt [bdt [，.I [g == max（g）]，by = id] $ V1]

这避免构建 .SD ，这是您的表达式中的瓶颈。

实际上，OP运行缓慢的主要原因不仅在于它具有 .SD ，而是它以特定的方式使用它 - 调用 [。data.table ，这在目前有一个巨大的开销，所以运行它在一个循环（当一个通过）累积了非常大的代价。

Assume I have a data table containing some baseball players:

library(plyr)
library(data.table)

bdt <- as.data.table(baseball)

For each player (given by id), I want to find the row corresponding to the year in which they played the most games. This is straightforward in plyr:

ddply(baseball, "id", subset, g == max(g))

What's the equivalent code for data.table?

I tried:

setkey(bdt, "id") 
bdt[g == max(g)]  # only one row
bdt[g == max(g), by = id]  # Error: 'by' or 'keyby' is supplied but not j
bdt[, .SD[g == max(g)]] # only one row

This works:

bdt[, .SD[g == max(g)], by = id]

But it's is only 30% faster than plyr, suggesting it's probably not idiomatic.

解决方案

Here's the fast data.table way:

bdt[bdt[, .I[g == max(g)], by = id]$V1]

This avoids constructing .SD, which is the bottleneck in your expressions.

edit: Actually, the main reason the OP is slow is not just that it has .SD in it, but the fact that it uses it in a particular way - by calling [.data.table, which at the moment has a huge overhead, so running it in a loop (when one does a by) accumulates a very large penalty.

这篇关于子组与data.table组的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

子组与data.table组 [英] Subset by group with data.table

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

子组与data.table组 [英] Subset by group with data.table

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭