按组在data.table中创建新列 [英] Create new column in data.table by group

查看:129
本文介绍了按组在data.table中创建新列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我没有data.table的经验,所以我不知道是否有解决我的问题(30分钟在谷歌至少没有回答至少),但在这里。

I have no experience with data.table, so I don't know if there is a solution to my question (30 minutes on Google gave no answer at least), but here it goes.

对于data.frame,我经常使用以下命令来检查唯一值的观察次数:

With data.frame I often use the following command to check the number of observations of a unique value:

df$Obs=with(df, ave(v1, ID-Date, FUN=function(x) length(unique(x))))  

使用data.table时是否有相应的方法?

Is there any corresponding method when working with data.table?

干杯! :)

推荐答案

是的,有。幸运的是,您已经询问了v1.8.2中添加的 data.table 的最新功能之一:

Yes, there is. Happily, you've asked about one of the newest features of data.table, added in v1.8.2 :


:= 按组现在实现了(FR#1491)和子分配到一个新列
通过引用现在添加列( NA 初始化,其中
子分配不接触)(FR#1997)。 := 可以与 i 的所有
类型组合,因此 := 按组包括按 i 以及
由于:= 按组引用,应该明显快于任何
方法(直接或间接) cbind 将分组结果分配给DT,因为
没有(大)DT的副本。这是一个简短而自然的语法,
可以与其他查询复合。

DT [,newcol:= sum(colB),by = colA]

:= by group is now implemented (FR#1491) and sub-assigning to a new column by reference now adds the column automatically (initialized with NA where the sub-assign doesn't touch) (FR#1997). := by group can be combined with all types of i, so := by group includes grouping by i as well as by by. Since := by group is by reference, it should be significantly faster than any method that (directly or indirectly) cbinds the grouped results to DT, since no copy of the (large) DT is made at all. It's a short and natural syntax that can be compounded with other queries.
DT[,newcol:=sum(colB),by=colA]

在您的示例中,iiuc应该是:

In your example, iiuc, it should be something like :

DT[, Obs:=.N, by=ID-Date]


b $ b

而不是:

instead of :

df$Obs=with(df, ave(v1, ID-Date, FUN=function(x) length(unique(x))))

注意

请参阅<$ c $ <$>

See ?":=" and Search data.table tag for "reference"

这篇关于按组在data.table中创建新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆