data.table分别对数字和文本变量分组 [英] data.table grouping separately on numeric and text variables

查看：122 发布时间：2017/3/12 12:12:30 r join data.table lapply

本文介绍了data.table分别对数字和文本变量分组的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图简化这个 data.table 两个阶段的过程，对数字和字符变量都起作用。例如。 - 接受 textvar 和 sum 的每个数字变量的第一个元素。考虑这个小范例：

I'm trying to simplify this data.table two-stage process which acts on both numeric and character variables. E.g. - take the first element of textvar and sum each of the numeric variables. Consider this small example:

library(data.table)
dt <- data.table(grpvar=letters[c(1,1,2)], textvar=c("one","two","one"),
                 numvar=1:3, othernum=2:4)
dt
#   grpvar textvar numvar othernum
#1:      a     one      1        2
#2:      a     two      2        3
#3:      b     one      3        4

现在我的第一个想法是嵌套 .SD lapply 调用，但我认为这有点复杂：

Now my first thought was to nest .SD to drop the one variable out of the lapply call, but I thought that was a bit complicated:

dt[, c(textvar=textvar[1], .SD[, lapply(.SD, sum), .SDcols=-c("textvar")]), by=grpvar]
#   grpvar textvar numvar othernum
#1:      a     one      3        5
#2:      b     one      3        4

然后我想可能我可以单独做每个分组，并加入它们，但是似乎更糟：

Then I thought maybe I could do each grouping separately and join them, but that seems even worse:

dt[, .(textvar=textvar[1]), by=grpvar][ 
  dt[, lapply(.SD, sum), by=grpvar, .SDcols=-c("textvar")], on="grpvar" 
]
#   grpvar textvar numvar othernum
#1:      a     one      3        5
#2:      b     one      3        4

有一个更简单的结构，可以绕过 .SD 的嵌套或加入？

Is there a simpler construction that would get around the nesting of .SD or the joining? I feel like I'm overlooking something elementary.

推荐答案

j - data.table 中的提示（有意）非常灵活。我们需要记住的是：

The j-argument in data.table is (deliberately) quite flexible. All we need to remember is that:

只要 j

使用事实，列表中的每个元素都将成为数据表中的一个列。 c（list，list）是一个列表，我们可以构造表达式如下：

Using the fact that c(list, list) is a list, we can construct the expression as follows:

dt[, c(textvar = textvar[1L], lapply(.SD, sum)), # select/compute all cols necessary
      .SDcols = numvar:othernum,                 # provide .SD's columns 
      by = grpvar]                               # group by 'grpvar'
#    grpvar textvar numvar othernum
# 1:      a     one      3        5
# 2:      b     one      3        4

这里，我没有包装第一个表达式 list（）因为 textvar [1L] 返回长度= 1向量.. ie， c（1，list（2,3）），c（list（1），list（2,3）））是 TRUE 。

Here, I've not wrapped the first expression with list() since textvar[1L] returns a length=1 vector.. i.e., identical(c(1, list(2, 3)), c(list(1), list(2,3))) is TRUE.

请注意，这只能从 v1.9.7 。该bug最近刚刚在当前开发版本中修复。

Note that this is only possible from v1.9.7. The bug was just recently fixed in the current development version.

这篇关于data.table分别对数字和文本变量分组的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

data.table分别对数字和文本变量分组 [英] data.table grouping separately on numeric and text variables

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

data.table分别对数字和文本变量分组 [英] data.table grouping separately on numeric and text variables

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭