当变量名称存储在字符向量中时,选择/分配给data.table [英] Select / assign to data.table when variable names are stored in a character vector

查看:90
本文介绍了当变量名称存储在字符向量中时,选择/分配给data.table的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果变量名称存储在字符向量中,如何在 data.table 中引用变量?例如,这适用于 data.frame

How do you refer to variables in a data.table if the variable names are stored in a character vector? For instance, this works for a data.frame:

df <- data.frame(col1 = 1:3)
colname <- "col1"
df[colname] <- 4:6
df
#   col1
# 1    4
# 2    5
# 3    6

如何使用或不使用:= 表示法对data.table执行相同的操作? dt [,list(colname)] 显然不起作用(我也不希望这样做)。

How can I perform this same operation for a data.table, either with or without := notation? The obvious thing of dt[ , list(colname)] doesn't work (nor did I expect it to).

推荐答案

以编程方式选择变量的两种方法:

Two ways to programmatically select variable(s):


  1. with = FALSE

DT = data.table(col1 = 1:3)
colname = "col1"
DT[, colname, with = FALSE] 
#    col1
# 1:    1
# 2:    2
# 3:    3


  • '点点'( .. )前缀:

    DT[, ..colname]    
    #    col1
    # 1:    1
    # 2:    2
    # 3:    3
    


  • 有关点点( .. )表示法,请参见 1.10.2中的新功能(当前未在帮助文本中进行描述)。

    For further description of the 'dot dot' (..) notation, see New Features in 1.10.2 (it is currently not described in help text).

    assign 到变量,将:= 的LHS括在括号中:

    To assign to variable(s), wrap the LHS of := in parentheses:

    DT[, (colname) := 4:6]    
    #    col1
    # 1:    4
    # 2:    5
    # 3:    6
    

    后者称为列 plonk ,因为您通过引用替换了整个列向量。如果存在子集 i ,它将通过引用进行子分配。 (colname)周围的括号是2014年10月CRAN v1.9.4版中引入的简写。这是新闻项

    The latter is known as a column plonk, because you replace the whole column vector by reference. If a subset i was present, it would subassign by reference. The parens around (colname) is a shorthand introduced in version v1.9.4 on CRAN Oct 2014. Here is the news item:


    使用和= FALSE := 现在,在所有情况下均不建议使用,因为将
    包裹在LHS中的:= 并带有括号已被使用一段时间了。

    Using with = FALSE with := is now deprecated in all cases, given that wrapping the LHS of := with parentheses has been preferred for some time.

    colVar = "col1"
    DT[, colVar := 1, with = FALSE]                 # deprecated, still works silently
    DT[, (colVar) := 1]                             # please change to this
    DT[, c("col1", "col2") := 1]                    # no change
    DT[, 2:4 := 1]                                  # no change
    DT[, c("col1","col2") := list(sum(a), mean(b)]  # no change
    DT[, `:=`(...), by = ...]                       # no change
    


    另请参见中的详细信息部分?`:=`

    See also Details section in ?`:=`:

    DT[i, (colnamevector) := value]
    # [...] The parens are enough to stop the LHS being a symbol
    






    回答评论中的其他问题,这是一种方法(照常很多方法):


    And to answer further question in comment, here's one way (as usual there are many ways) :

    DT[, colname := cumsum(get(colname)), with = FALSE]
    #    col1
    # 1:    4
    # 2:    9
    # 3:   15 
    

    ,或者,您可能会发现更容易阅读,编写和调试,只需 eval a paste ,类似于构造动态SQL语句以发送到服务器:

    or, you might find it easier to read, write and debug just to eval a paste, similar to constructing a dynamic SQL statement to send to a server :

    expr = paste0("DT[,",colname,":=cumsum(",colname,")]")
    expr
    # [1] "DT[,col1:=cumsum(col1)]"
    
    eval(parse(text=expr))
    #    col1
    # 1:    4
    # 2:   13
    # 3:   28
    

    如果您经常这样做,则可以定义一个辅助函数 EVAL

    If you do that a lot, you can define a helper function EVAL :

    EVAL = function(...)eval(parse(text=paste0(...)),envir=parent.frame(2))
    
    EVAL("DT[,",colname,":=cumsum(",colname,")]")
    #    col1
    # 1:    4
    # 2:   17
    # 3:   45
    

    现在 data.table 1.8.2自动优化 j 的效率,它最好使用 eval 方法。例如, j 中的 get()会阻止某些优化。

    Now that data.table 1.8.2 automatically optimizes j for efficiency, it may be preferable to use the eval method. The get() in j prevents some optimizations, for example.

    或者,有 set():= 的低开销,函数形式,在这里很好。请参阅?set

    Or, there is set(). A low overhead, functional form of :=, which would be fine here. See ?set.

    set(DT, j = colname, value = cumsum(DT[[colname]]))
    DT
    #    col1
    # 1:    4
    # 2:   21
    # 3:   66
    

    这篇关于当变量名称存储在字符向量中时,选择/分配给data.table的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆