当变量名称存储在字符向量中时选择/分配给 data.table [英] Select / assign to data.table when variable names are stored in a character vector

查看:25
本文介绍了当变量名称存储在字符向量中时选择/分配给 data.table的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果变量名存储在字符向量中,你如何引用 data.table 中的变量?例如,这适用于 data.frame:

df <- data.frame(col1 = 1:3)列名 <- "col1"df[colname] <- 4:6df# col1# 1 4# 2 5# 3 6

如何对 data.table 执行相同的操作,无论是否使用 := 符号?dt[ , list(colname)] 显而易见的事情不起作用(我也没想到它会起作用).

解决方案

以编程方式选择变量的两种方式:

  1. with = FALSE:

     DT = data.table(col1 = 1:3)colname = "col1";DT[, colname, with = FALSE]# col1# 1: 1# 2: 2# 3: 3

  2. 'dot dot' (..) 前缀:

     DT[, ..colname]# col1# 1: 1# 2: 2# 3: 3

有关点点"(..)符号的进一步说明,请参阅1.10.2 中的新功能(目前没有在帮助文本中描述).>

要将赋值给变量,请将:=的LHS括在括号中:

DT[, (colname) := 4:6]# col1# 1: 4# 2: 5# 3: 6

后者被称为列 plonk,因为您通过引用替换了整个列向量.如果存在子集 i,它将通过引用进行子分配.(colname) 周围的括号是 2014 年 10 月 CRAN 版本 v1.9.4 中引入的速记.这里是 新闻项目:

<块引用>

使用 with = FALSE:= 现在在所有情况下都不推荐使用,因为包装带括号的 := 的 LHS 已经有一段时间了.

<块引用>

colVar = "col1";

DT[, (colVar) := 1] # 请改成这样DT[, c("col1", "col2") := 1] # 没有变化DT[, 2:4 := 1] # 没有变化DT[, c("col1","col2") := list(sum(a), mean(b))] # 没有变化DT[, `:=`(...), by = ...] # 没有变化

另见?`:=`中的详细信息部分:

DT[i, (colnamevector) := value]# [...] 括号足以阻止 LHS 成为一个符号


并在评论中回答进一步的问题,这是一种方法(通常有很多方法):

DT[, colname := cumsum(get(colname)), with = FALSE]# col1# 1: 4# 2: 9# 3: 15

或者,您可能会发现仅对 eval 一个 paste 进行阅读、编写和调试会更容易,类似于构建动态 SQL 语句以发送到服务器:

expr = paste0("DT[,",colname,":=cumsum(",colname,")]")表达式# [1] "DT[,col1:=cumsum(col1)]";评估(解析(文本=表达式))# col1# 1: 4# 2: 13# 3: 28

如果你经常这样做,你可以定义一个辅助函数 EVAL :

EVAL = function(...)eval(parse(text=paste0(...)),envir=parent.frame(2))EVAL("DT[,",colname,":=cumsum(",colname,")]")# col1# 1: 4# 2: 17# 3: 45

既然 data.table 1.8.2 自动优化了 j 以提高效率,最好使用 eval 方法.例如,j 中的 get() 阻止了一些优化.

或者,有set().:= 的一种低开销的函数形式,在这里很好.参见 ?set.

set(DT, j = colname, value = cumsum(DT[[colname]]))DT# col1# 1: 4# 2: 21# 3: 66

How do you refer to variables in a data.table if the variable names are stored in a character vector? For instance, this works for a data.frame:

df <- data.frame(col1 = 1:3)
colname <- "col1"
df[colname] <- 4:6
df
#   col1
# 1    4
# 2    5
# 3    6

How can I perform this same operation for a data.table, either with or without := notation? The obvious thing of dt[ , list(colname)] doesn't work (nor did I expect it to).

解决方案

Two ways to programmatically select variable(s):

  1. with = FALSE:

     DT = data.table(col1 = 1:3)
     colname = "col1"
     DT[, colname, with = FALSE] 
     #    col1
     # 1:    1
     # 2:    2
     # 3:    3
    

  2. 'dot dot' (..) prefix:

     DT[, ..colname]    
     #    col1
     # 1:    1
     # 2:    2
     # 3:    3
    

For further description of the 'dot dot' (..) notation, see New Features in 1.10.2 (it is currently not described in help text).

To assign to variable(s), wrap the LHS of := in parentheses:

DT[, (colname) := 4:6]    
#    col1
# 1:    4
# 2:    5
# 3:    6

The latter is known as a column plonk, because you replace the whole column vector by reference. If a subset i was present, it would subassign by reference. The parens around (colname) is a shorthand introduced in version v1.9.4 on CRAN Oct 2014. Here is the news item:

Using with = FALSE with := is now deprecated in all cases, given that wrapping the LHS of := with parentheses has been preferred for some time.

colVar = "col1"

DT[, (colVar) := 1]                             # please change to this
DT[, c("col1", "col2") := 1]                    # no change
DT[, 2:4 := 1]                                  # no change
DT[, c("col1","col2") := list(sum(a), mean(b))]  # no change
DT[, `:=`(...), by = ...]                       # no change

See also Details section in ?`:=`:

DT[i, (colnamevector) := value]
# [...] The parens are enough to stop the LHS being a symbol


And to answer further question in comment, here's one way (as usual there are many ways) :

DT[, colname := cumsum(get(colname)), with = FALSE]
#    col1
# 1:    4
# 2:    9
# 3:   15 

or, you might find it easier to read, write and debug just to eval a paste, similar to constructing a dynamic SQL statement to send to a server :

expr = paste0("DT[,",colname,":=cumsum(",colname,")]")
expr
# [1] "DT[,col1:=cumsum(col1)]"

eval(parse(text=expr))
#    col1
# 1:    4
# 2:   13
# 3:   28

If you do that a lot, you can define a helper function EVAL :

EVAL = function(...)eval(parse(text=paste0(...)),envir=parent.frame(2))

EVAL("DT[,",colname,":=cumsum(",colname,")]")
#    col1
# 1:    4
# 2:   17
# 3:   45

Now that data.table 1.8.2 automatically optimizes j for efficiency, it may be preferable to use the eval method. The get() in j prevents some optimizations, for example.

Or, there is set(). A low overhead, functional form of :=, which would be fine here. See ?set.

set(DT, j = colname, value = cumsum(DT[[colname]]))
DT
#    col1
# 1:    4
# 2:   21
# 3:   66

这篇关于当变量名称存储在字符向量中时选择/分配给 data.table的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆