选择/分配给data.table变量,其名称存储在字符向量中 [英] Select / assign to data.table variables which names are stored in a character vector

查看:120
本文介绍了选择/分配给data.table变量,其名称存储在字符向量中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果变量名存储在字符向量中,那么如何引用 data.table 中的变量?例如,这适用于 data.frame

How do you refer to variables in a data.table if the variable names are stored in a character vector? For instance, this works for a data.frame:

df <- data.frame(col1 = 1:3)
colname <- "col1"
df[colname] <- 4:6
df
#   col1
# 1    4
# 2    5
# 3    6

如何使用:= 符号对data.table执行相同的操作?显而易见的事情 dt [,list(colname)] 不工作(也不期望它)。

How can I perform this same operation for a data.table, either with or without := notation? The obvious thing of dt[ , list(colname)] doesn't work (nor did I expect it to).

推荐答案

尝试:

DT = data.table(col1 = 1:3)
colname = "col1"

DT[, colname, with=FALSE]    # select
#    col1
# 1:    1
# 2:    2
# 3:    3

DT[, (colname) := 4:6]    # assign
#    col1
# 1:    4
# 2:    5
# 3:    6

后者称为列 plonk ,因为您通过引用替换整个列向量。如果存在 i 的子集,它将通过引用进行子分配。 (colname)上的括号是在CRAN 2014年10月版本v1.9.4中引入的缩写。以下是新闻项:

The latter is known as a column plonk, because you replace the whole column vector by reference. If a subset i was present, it would subassign by reference. The parens around (colname) is a shorthand introduced in version v1.9.4 on CRAN Oct 2014. Here is the news item :

Using with=FALSE with := is now deprecated in all cases, given that wrapping
the LHS of := with parentheses has been preferred for some time.

colVar = "col1"
DT[, colVar:=1, with=FALSE]                   # deprecated, still works silently
DT[, (colVar):=1]                             # please change to this
DT[, c("col1","col2"):=1]                     # no change
DT[, 2:4 := 1]                                # no change
DT[, c("col1","col2"):=list(sum(a),mean(b)]   # no change
DT[, `:=`(...), by=...]                       # no change

另请参见<$ c $中的详细 c>?:=`:

See also Details section in ?`:=`:

DT[i,(colnamevector):=value]
# [...] The parens are enough to stop the LHS being a symbol

为了在评论中回答更多的问题,这里有一种方法(通常有很多方法):

And to answer further question in comment, here's one way (as usual there are many ways) :

DT[, colname:=cumsum(get(colname)), with=FALSE]
#    col1
# 1:    4
# 2:    9
# 3:   15 

或者,您可能会发现只需更改 eval a 粘贴,类似于构造动态SQL语句以发送到服务器:

or, you might find it easier to read, write and debug just to eval a paste, similar to constructing a dynamic SQL statement to send to a server :

expr = paste0("DT[,",colname,":=cumsum(",colname,")]")
expr
# [1] "DT[,col1:=cumsum(col1)]"
> eval(parse(text=expr))
#    col1
# 1:    4
# 2:   13
# 3:   28

如果你这么做,你可以定义一个辅助函数 EVAL

If you do that a lot, you can define a helper function EVAL :

EVAL = function(...)eval(parse(text=paste0(...)),envir=parent.frame(2))

EVAL("DT[,",colname,":=cumsum(",colname,")]")
#    col1
# 1:    4
# 2:   17
# 3:   45

现在 data.table 1.8.2为了效率自动优化 j ,最好使用 eval 方法。 j 中会阻止某些优化。

Now that data.table 1.8.2 automatically optimizes j for efficiency, it may be preferable to use the eval method. The get() in j prevents some optimizations, for example.

或者,有 set():= 的低开销,功能形式,这里很好。请参阅?set

Or, there is set(). A low overhead, functional form of :=, which would be fine here. See ?set.

set(DT,j=colname,value=cumsum(DT[[colname]]))
DT
#    col1
# 1:    4
# 2:   21
# 3:   66

这篇关于选择/分配给data.table变量,其名称存储在字符向量中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆