data.table分别对数字和文本变量分组 [英] data.table grouping separately on numeric and text variables
问题描述
我试图简化这个 data.table
两个阶段的过程,对数字和字符变量都起作用。例如。 - 接受 textvar
和 sum
的每个数字变量的第一个元素。考虑这个小范例:
I'm trying to simplify this data.table
two-stage process which acts on both numeric and character variables. E.g. - take the first element of textvar
and sum
each of the numeric variables. Consider this small example:
library(data.table)
dt <- data.table(grpvar=letters[c(1,1,2)], textvar=c("one","two","one"),
numvar=1:3, othernum=2:4)
dt
# grpvar textvar numvar othernum
#1: a one 1 2
#2: a two 2 3
#3: b one 3 4
现在我的第一个想法是嵌套 .SD
lapply
调用,但我认为这有点复杂:
Now my first thought was to nest .SD
to drop the one variable out of the lapply
call, but I thought that was a bit complicated:
dt[, c(textvar=textvar[1], .SD[, lapply(.SD, sum), .SDcols=-c("textvar")]), by=grpvar]
# grpvar textvar numvar othernum
#1: a one 3 5
#2: b one 3 4
然后我想可能我可以单独做每个分组,并加入它们,但是似乎更糟:
Then I thought maybe I could do each grouping separately and join them, but that seems even worse:
dt[, .(textvar=textvar[1]), by=grpvar][
dt[, lapply(.SD, sum), by=grpvar, .SDcols=-c("textvar")], on="grpvar"
]
# grpvar textvar numvar othernum
#1: a one 3 5
#2: b one 3 4
有一个更简单的结构,可以绕过 .SD
的嵌套或加入?
Is there a simpler construction that would get around the nesting of .SD
or the joining? I feel like I'm overlooking something elementary.
推荐答案
j
- data.table 中的提示(有意)非常灵活。我们需要记住的是:
The j
-argument in data.table is (deliberately) quite flexible. All we need to remember is that:
只要
j
使用事实,列表中的每个元素都将成为数据表中的一个列。 c(list,list)
是一个列表
,我们可以构造表达式如下:
Using the fact that c(list, list)
is a list
, we can construct the expression as follows:
dt[, c(textvar = textvar[1L], lapply(.SD, sum)), # select/compute all cols necessary
.SDcols = numvar:othernum, # provide .SD's columns
by = grpvar] # group by 'grpvar'
# grpvar textvar numvar othernum
# 1: a one 3 5
# 2: b one 3 4
这里,我没有包装第一个表达式 list()
因为 textvar [1L]
返回长度= 1向量.. ie, c(1,list(2,3)),c(list(1),list(2,3)))
是 TRUE
。
Here, I've not wrapped the first expression with list()
since textvar[1L]
returns a length=1 vector.. i.e., identical(c(1, list(2, 3)), c(list(1), list(2,3)))
is TRUE
.
请注意,这只能从 v1.9.7
。该bug最近刚刚在当前开发版本中修复。
Note that this is only possible from v1.9.7
. The bug was just recently fixed in the current development version.
这篇关于data.table分别对数字和文本变量分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!