按组在data.table中创建新列 [英] Create new column in data.table by group
问题描述
我没有data.table的经验,所以我不知道是否有解决我的问题(30分钟在谷歌至少没有回答至少),但在这里。
I have no experience with data.table, so I don't know if there is a solution to my question (30 minutes on Google gave no answer at least), but here it goes.
对于data.frame,我经常使用以下命令来检查唯一值的观察次数:
With data.frame I often use the following command to check the number of observations of a unique value:
df$Obs=with(df, ave(v1, ID-Date, FUN=function(x) length(unique(x))))
使用data.table时是否有相应的方法?
Is there any corresponding method when working with data.table?
干杯! :)
推荐答案
是的,有。幸运的是,您已经询问了v1.8.2中添加的 data.table
的最新功能之一:
Yes, there is. Happily, you've asked about one of the newest features of data.table
, added in v1.8.2 :
:=
按组现在实现了(FR#1491)和子分配到一个新列
通过引用现在添加列(NA
初始化,其中
子分配不接触)(FR#1997)。:=
可以与i
的所有
类型组合,因此:= 按组
包括按i
以及按
。
由于:=
按组引用,应该明显快于任何
方法(直接或间接)cbind
将分组结果分配给DT,因为
没有(大)DT的副本。这是一个简短而自然的语法,
可以与其他查询复合。
DT [,newcol:= sum(colB),by = colA]
:=
by group is now implemented (FR#1491) and sub-assigning to a new column by reference now adds the column automatically (initialized withNA
where the sub-assign doesn't touch) (FR#1997).:=
by group can be combined with all types ofi
, so:=
by group includes grouping byi
as well as byby
. Since:=
by group is by reference, it should be significantly faster than any method that (directly or indirectly)cbind
s the grouped results to DT, since no copy of the (large) DT is made at all. It's a short and natural syntax that can be compounded with other queries.
DT[,newcol:=sum(colB),by=colA]
在您的示例中,iiuc应该是:
In your example, iiuc, it should be something like :
DT[, Obs:=.N, by=ID-Date]
b $ b
而不是:
instead of :
df$Obs=with(df, ave(v1, ID-Date, FUN=function(x) length(unique(x))))
注意 请参阅<$ c $ <$> See 这篇关于按组在data.table中创建新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
?":="
and Search data.table tag for "reference"