data.table 中的动态列名 [英] Dynamic column names in data.table

查看:20
本文介绍了data.table 中的动态列名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试向我的 data.table 添加列,其中名称是动态的.我还需要在添加这些列时使用 by 参数.例如:

I am trying to add columns to my data.table, where the names are dynamic. I addition I need to use the by argument when adding these columns. For example:

test_dtb <- data.table(a = sample(1:100, 100), b = sample(1:100, 100), id = rep(1:10,10))
cn <- parse(text = "blah")
test_dtb[ , eval(cn) := mean(a), by = id]

# Error in `[.data.table`(test_dtb, , `:=`(eval(cn), mean(a)), by = id) : 
#  LHS of := must be a single column name when with=TRUE. When with=FALSE the LHS may be a vector of column names or positions.

另一个尝试:

cn <- "blah"
test_dtb[ , cn := mean(a), by = id, with = FALSE]
# Error in `[.data.table`(test_dtb, , `:=`(cn, mean(a)), by = id, with = FALSE) : 'with' must be TRUE when 'by' or 'keyby' is provided

<小时>

来自 Matthew 的更新:

这现在适用于 R-Forge 的 v1.8.3.感谢您的强调!
有关新示例,请参阅此类似问题:

This now works in v1.8.3 on R-Forge. Thanks for highlighting!
See this similar question for new examples:

使用 data.table 按组分配多列

推荐答案

data.table 1.9.4,你可以这样做:

## A parenthesized symbol, `(cn)`, gets evaluated to "blah" before `:=` is carried out
test_dtb[, (cn) := mean(a), by = id]
head(test_dtb, 4)
#     a  b id blah
# 1: 41 19  1 54.2
# 2:  4 99  2 50.0
# 3: 49 85  3 46.7
# 4: 61  4  4 57.1

请参阅 ?:= 中的详细信息:

DT[i, (colvector) := val]

[...] 现在首选 [...] 语法.括号足以阻止 LHS 成为一个符号.与 c(colvector)

[...] NOW PREFERRED [...] syntax. The parens are enough to stop the LHS being a symbol; same as c(colvector)


原答案:

您的做法完全正确:在调用 [.data.table 时构造要评估的表达式是执行此排序的 data.table 方式的事情.再进一步,为什么不构造一个表达式来评估 entire j 参数(而不仅仅是它的左侧)?

You were on exactly the right track: constructing an expression to be evaluated within the call to [.data.table is the data.table way to do this sort of thing. Going just a bit further, why not construct an expression that evaluates to the entire j argument (rather than just its left hand side)?

这样的事情应该可以解决问题:

Something like this should do the trick:

## Your code so far
library(data.table)
test_dtb <- data.table(a=sample(1:100, 100),b=sample(1:100, 100),id=rep(1:10,10))
cn <- "blah"

## One solution
expr <- parse(text = paste0(cn, ":=mean(a)"))
test_dtb[,eval(expr), by=id]

## Checking the result
head(test_dtb, 4)
#     a  b id blah
# 1: 30 26  1 38.4
# 2: 83 82  2 47.4
# 3: 47 66  3 39.5
# 4: 87 23  4 65.2

这篇关于data.table 中的动态列名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆