当变量名称存储在字符向量中时,选择/分配给data.table [英] Select / assign to data.table when variable names are stored in a character vector
问题描述
如果变量名称存储在字符向量中,如何在 data.table
中引用变量?例如,这适用于 data.frame
:
How do you refer to variables in a data.table
if the variable names are stored in a character vector? For instance, this works for a data.frame
:
df <- data.frame(col1 = 1:3)
colname <- "col1"
df[colname] <- 4:6
df
# col1
# 1 4
# 2 5
# 3 6
如何使用或不使用:=
表示法对data.table执行相同的操作? dt [,list(colname)]
显然不起作用(我也不希望这样做)。
How can I perform this same operation for a data.table, either with or without :=
notation? The obvious thing of dt[ , list(colname)]
doesn't work (nor did I expect it to).
推荐答案
以编程方式选择变量的两种方法:
Two ways to programmatically select variable(s):
-
with = FALSE
:
DT = data.table(col1 = 1:3)
colname = "col1"
DT[, colname, with = FALSE]
# col1
# 1: 1
# 2: 2
# 3: 3
'点点'( ..
)前缀:
DT[, ..colname]
# col1
# 1: 1
# 2: 2
# 3: 3
有关点点( ..
)表示法,请参见 1.10.2中的新功能(当前未在帮助文本中进行描述)。
For further description of the 'dot dot' (..
) notation, see New Features in 1.10.2 (it is currently not described in help text).
要 assign 到变量,将:=
的LHS括在括号中:
To assign to variable(s), wrap the LHS of :=
in parentheses:
DT[, (colname) := 4:6]
# col1
# 1: 4
# 2: 5
# 3: 6
后者称为列 plonk ,因为您通过引用替换了整个列向量。如果存在子集 i
,它将通过引用进行子分配。 (colname)
周围的括号是2014年10月CRAN v1.9.4版中引入的简写。这是新闻项:
The latter is known as a column plonk, because you replace the whole column vector by reference. If a subset i
was present, it would subassign by reference. The parens around (colname)
is a shorthand introduced in version v1.9.4 on CRAN Oct 2014. Here is the news item:
使用
和= FALSE
和:=
现在,在所有情况下均不建议使用,因为将
包裹在LHS中的:=
并带有括号已被使用一段时间了。
Using
with = FALSE
with:=
is now deprecated in all cases, given that wrapping the LHS of:=
with parentheses has been preferred for some time.
colVar = "col1"
DT[, colVar := 1, with = FALSE] # deprecated, still works silently
DT[, (colVar) := 1] # please change to this
DT[, c("col1", "col2") := 1] # no change
DT[, 2:4 := 1] # no change
DT[, c("col1","col2") := list(sum(a), mean(b)] # no change
DT[, `:=`(...), by = ...] # no change
另请参见中的详细信息部分?`:=`
:
See also Details section in ?`:=`
:
DT[i, (colnamevector) := value]
# [...] The parens are enough to stop the LHS being a symbol
回答评论中的其他问题,这是一种方法(照常很多方法):
And to answer further question in comment, here's one way (as usual there are many ways) :
DT[, colname := cumsum(get(colname)), with = FALSE]
# col1
# 1: 4
# 2: 9
# 3: 15
,或者,您可能会发现更容易阅读,编写和调试,只需 eval
a paste
,类似于构造动态SQL语句以发送到服务器:
or, you might find it easier to read, write and debug just to eval
a paste
, similar to constructing a dynamic SQL statement to send to a server :
expr = paste0("DT[,",colname,":=cumsum(",colname,")]")
expr
# [1] "DT[,col1:=cumsum(col1)]"
eval(parse(text=expr))
# col1
# 1: 4
# 2: 13
# 3: 28
如果您经常这样做,则可以定义一个辅助函数 EVAL
:
If you do that a lot, you can define a helper function EVAL
:
EVAL = function(...)eval(parse(text=paste0(...)),envir=parent.frame(2))
EVAL("DT[,",colname,":=cumsum(",colname,")]")
# col1
# 1: 4
# 2: 17
# 3: 45
现在 data.table
1.8.2自动优化 j
的效率,它最好使用 eval
方法。例如, j
中的 get()
会阻止某些优化。
Now that data.table
1.8.2 automatically optimizes j
for efficiency, it may be preferable to use the eval
method. The get()
in j
prevents some optimizations, for example.
或者,有 set()
。 :=
的低开销,函数形式,在这里很好。请参阅?set
。
Or, there is set()
. A low overhead, functional form of :=
, which would be fine here. See ?set
.
set(DT, j = colname, value = cumsum(DT[[colname]]))
DT
# col1
# 1: 4
# 2: 21
# 3: 66
这篇关于当变量名称存储在字符向量中时,选择/分配给data.table的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!