如何获取当前组在data.table分组中的长度? [英] How to get length of current group in data.table grouping?

查看:186
本文介绍了如何获取当前组在data.table分组中的长度?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道这可以用其他包实现,但我想尝试在data.table(因为它似乎是最快的分组)。

  library(data.table)
dt = data.table(a = c(1,2,2,3))
dt [,length ),by = a]

结果

  a V1 
1:1 1
2:2 1
3:3 1
pre>

  df = data.frame (1,2,2,3))
ddply(df,。(a),summarize,V1 = length(a))

产生

  a V1 
1 1 1
2 2 2
3 3 1

这是一个更明智的结果。只要想知道为什么data.table没有给出相同的结果,以及如何实现。



感谢

解决方案

data.table方法是使用特殊变量 .N ,它跟踪行数当前组。 (其他特殊变量包括 .SD .BY (在版本1.8.2中)和 .GRP (可从版本1.8.3获得)。?data.table ):

  library(data.table)
dt = data.table(a = c (1,2,2,3))

dt [,.N,by = a]
#a N
#1:1 1
#2 :2 2
#3:3 1

工作,运行以下操作,在每个浏览器提示符处检查 a length(a)的值:

  dt [,browser(),by = a] 


I know this can be achieve with other packages, but I'm trying to do it in data.table (as it seems to be the fastest for grouping).

library(data.table)
dt = data.table(a=c(1,2,2,3))
dt[,length(a),by=a]

results in

   a V1
1: 1  1
2: 2  1
3: 3  1

whereas

df = data.frame(a=c(1,2,2,3))
ddply(df,.(a),summarise,V1=length(a))

produces

  a V1
1 1  1
2 2  2
3 3  1

which is a more sensible results. Just wondering why data.table is not giving the same results, and how this can be achieved.

thanks

解决方案

The data.table way to do this is to use special variable, .N, which keeps track of the number of rows in the current group. (Other special variables include .SD, .BY (in version 1.8.2) and .I and .GRP (available from version 1.8.3). All are documented in ?data.table):

library(data.table)
dt = data.table(a=c(1,2,2,3))

dt[, .N, by = a]
#    a N
# 1: 1 1
# 2: 2 2
# 3: 3 1

To see why what you tried didn't work, run the following, checking the value of a and length(a) at each browser prompt:

dt[, browser(), by = a]

这篇关于如何获取当前组在data.table分组中的长度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆