当变量名称存储在字符向量中时选择/分配给 data.table [英] Select / assign to data.table when variable names are stored in a character vector
问题描述
如果变量名存储在字符向量中,你如何引用 data.table
中的变量?例如,这适用于 data.frame
:
df <- data.frame(col1 = 1:3)列名 <- "col1"df[colname] <- 4:6df# col1# 1 4# 2 5# 3 6
如何对 data.table 执行相同的操作,无论是否使用 :=
符号?dt[ , list(colname)]
显而易见的事情不起作用(我也没想到它会起作用).
以编程方式选择变量的两种方式:
with = FALSE
:DT = data.table(col1 = 1:3)colname = "col1";DT[, colname, with = FALSE]# col1# 1: 1# 2: 2# 3: 3
'dot dot' (
..
) 前缀:DT[, ..colname]# col1# 1: 1# 2: 2# 3: 3
有关点点"(..
)符号的进一步说明,请参阅1.10.2 中的新功能(目前没有在帮助文本中描述).>
要将赋值给变量,请将:=
的LHS括在括号中:
DT[, (colname) := 4:6]# col1# 1: 4# 2: 5# 3: 6
后者被称为列 plonk,因为您通过引用替换了整个列向量.如果存在子集 i
,它将通过引用进行子分配.(colname)
周围的括号是 2014 年 10 月 CRAN 版本 v1.9.4 中引入的速记.这里是 新闻项目:
使用 with = FALSE
和 :=
现在在所有情况下都不推荐使用,因为包装带括号的 :=
的 LHS 已经有一段时间了.
<块引用>
colVar = "col1";
DT[, (colVar) := 1] # 请改成这样DT[, c("col1", "col2") := 1] # 没有变化DT[, 2:4 := 1] # 没有变化DT[, c("col1","col2") := list(sum(a), mean(b))] # 没有变化DT[, `:=`(...), by = ...] # 没有变化
另见?`:=`
中的详细信息部分:
DT[i, (colnamevector) := value]# [...] 括号足以阻止 LHS 成为一个符号
并在评论中回答进一步的问题,这是一种方法(通常有很多方法):
DT[, colname := cumsum(get(colname)), with = FALSE]# col1# 1: 4# 2: 9# 3: 15
或者,您可能会发现仅对 eval
一个 paste
进行阅读、编写和调试会更容易,类似于构建动态 SQL 语句以发送到服务器:
expr = paste0("DT[,",colname,":=cumsum(",colname,")]")表达式# [1] "DT[,col1:=cumsum(col1)]";评估(解析(文本=表达式))# col1# 1: 4# 2: 13# 3: 28
如果你经常这样做,你可以定义一个辅助函数 EVAL
:
EVAL = function(...)eval(parse(text=paste0(...)),envir=parent.frame(2))EVAL("DT[,",colname,":=cumsum(",colname,")]")# col1# 1: 4# 2: 17# 3: 45
既然 data.table
1.8.2 自动优化了 j
以提高效率,最好使用 eval
方法.例如,j
中的 get()
阻止了一些优化.
或者,有set()
.:=
的一种低开销的函数形式,在这里很好.参见 ?set
.
set(DT, j = colname, value = cumsum(DT[[colname]]))DT# col1# 1: 4# 2: 21# 3: 66
How do you refer to variables in a data.table
if the variable names are stored in a character vector? For instance, this works for a data.frame
:
df <- data.frame(col1 = 1:3)
colname <- "col1"
df[colname] <- 4:6
df
# col1
# 1 4
# 2 5
# 3 6
How can I perform this same operation for a data.table, either with or without :=
notation? The obvious thing of dt[ , list(colname)]
doesn't work (nor did I expect it to).
Two ways to programmatically select variable(s):
with = FALSE
:DT = data.table(col1 = 1:3) colname = "col1" DT[, colname, with = FALSE] # col1 # 1: 1 # 2: 2 # 3: 3
'dot dot' (
..
) prefix:DT[, ..colname] # col1 # 1: 1 # 2: 2 # 3: 3
For further description of the 'dot dot' (..
) notation, see New Features in 1.10.2 (it is currently not described in help text).
To assign to variable(s), wrap the LHS of :=
in parentheses:
DT[, (colname) := 4:6]
# col1
# 1: 4
# 2: 5
# 3: 6
The latter is known as a column plonk, because you replace the whole column vector by reference. If a subset i
was present, it would subassign by reference. The parens around (colname)
is a shorthand introduced in version v1.9.4 on CRAN Oct 2014. Here is the news item:
Using
with = FALSE
with:=
is now deprecated in all cases, given that wrapping the LHS of:=
with parentheses has been preferred for some time.
colVar = "col1"
DT[, (colVar) := 1] # please change to this
DT[, c("col1", "col2") := 1] # no change
DT[, 2:4 := 1] # no change
DT[, c("col1","col2") := list(sum(a), mean(b))] # no change
DT[, `:=`(...), by = ...] # no change
See also Details section in ?`:=`
:
DT[i, (colnamevector) := value]
# [...] The parens are enough to stop the LHS being a symbol
And to answer further question in comment, here's one way (as usual there are many ways) :
DT[, colname := cumsum(get(colname)), with = FALSE]
# col1
# 1: 4
# 2: 9
# 3: 15
or, you might find it easier to read, write and debug just to eval
a paste
, similar to constructing a dynamic SQL statement to send to a server :
expr = paste0("DT[,",colname,":=cumsum(",colname,")]")
expr
# [1] "DT[,col1:=cumsum(col1)]"
eval(parse(text=expr))
# col1
# 1: 4
# 2: 13
# 3: 28
If you do that a lot, you can define a helper function EVAL
:
EVAL = function(...)eval(parse(text=paste0(...)),envir=parent.frame(2))
EVAL("DT[,",colname,":=cumsum(",colname,")]")
# col1
# 1: 4
# 2: 17
# 3: 45
Now that data.table
1.8.2 automatically optimizes j
for efficiency, it may be preferable to use the eval
method. The get()
in j
prevents some optimizations, for example.
Or, there is set()
. A low overhead, functional form of :=
, which would be fine here. See ?set
.
set(DT, j = colname, value = cumsum(DT[[colname]]))
DT
# col1
# 1: 4
# 2: 21
# 3: 66
这篇关于当变量名称存储在字符向量中时选择/分配给 data.table的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!