data.table`:=`赋值表达式带有动态输入(现有列)和输出(新列名) [英] data.table `:=` assignment expressions with dynamic inputs (existing columns) and outputs (new column names)

查看:88
本文介绍了data.table`:=`赋值表达式带有动态输入(现有列)和输出(新列名)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述


注意:我在此问题中输入的确切问题不适用于最近版本的数据表。如果您想执行标题中描述的操作,请查看常见问题解答中的相应问题: 提前知道表达式。


我已经看过答案,说明如何构建要在

中评估的表达式。

  DT [,j = eval(expr)] 

,```:=`(mycol = my_calculation)``,我想知道...





$ b是什么意思?my_calculation是一个动态确定的列吗?
$ b

由动态,我的意思是在我为 expr 编写代码后确定。



新示例



EDIT:为了更好地说明问题,下面是不同的示例。查看编辑历史记录以查看原始文件。

  require(data.table)
require(plyr)
options(datatable.verbose = TRUE)
DT < - CJ(a = 0:1,b = 0:1,y = 2)

#setup:
expr < - as.quoted(paste(expression(get(col_in_one)+ get(col_in_two))))[[1]]

#usage:
col_in_one& a'
col_in_two < - 'b'
col_out < - bah'
DT [,(col_out):= eval(expr)]#失败,应采取j = eval(expr)

我想保持设置和使用阶段分离 ,所以我的代码更容易维护。我的真实表达比这个例子更麻烦(它只选择一个列)。



问题




第一个问题:如何使分配到的列col_out动态?我的意思是:我想同时指定cols_in_ *和col_out。


expr,但 as.quoted 会抛出一个错误,表示不将某些东西放在 = 符号的左边。



c $ c>?


警告建议使用 .SDcols [。data.table 知道我正在使用哪些列。但是,如果我使用 .SDcols 参数,另一个警告说没有必要这样做,除非使用 .SD



暂时解决方案



我迄今为止的解决方案是...

 #Ricardo + eddi:
expr2< - as.quoted(paste(expression(`:=`(
Vtmp =。 SD [[col_in_one]] + .SD [[col_in_two]]))))[[1]]

#usage
col_in_one< - 'a'
col_in_two< ; - 'b'
col_out< - 'bah'
DT [,eval(expr2),SDcols = c(col_in_one,col_in_two)]
setnames(DT,'Vtmp', col_out)

这仍然涉及在两个步骤中执行操作和跟踪Vtmp ,所以第一个问题仍然部分开放。

解决方案

也许我不明白这个问题, :

  DT [,(col_out):= .SD [[col_in_one]] +。SD [[col_in_two]],
.SDcols = c(col_in_one,col_in_two)]
DT
#aby bah
#1:0 0 2 0
#2:0 1 2 1
#3:1 0 2 1
#4:1 1 2 2

编辑的问题,要使 eval 工作,请使用 .SD 作为环境:

  DT [,(col_out):= eval(expr,.SD)] 

此外,请参阅此问题及其中的更新 - eval和quote in data.table


Note: The precise problem I hit in this question does not apply to recent versions of data table. If you want to do something like described in the title, check out the corresponding question in the package FAQ, 1.6 OK, but I don’t know the expressions in advance. How do I programatically pass them in?.

I have seen an answer that illustrates how to construct an expression to be evaluated in

DT[,j=eval(expr)]

I am using this with an assignment, ```:=`(mycol=my_calculation)``, and I'm wondering...

  • How can I assign the name "mycol" dynamically?
  • What is the correct way to let "my_calculation" take a dynamically-determined set of columns?

By "dynamically", I mean "determined after I write the code for my expr".

New example

EDIT: To better illustrate the issue, here is different example. Look in the edit history to see the original.

require(data.table)
require(plyr)
options(datatable.verbose=TRUE)
DT <- CJ(a=0:1,b=0:1,y=2)

# setup:
expr  <- as.quoted(paste(expression(get(col_in_one)+get(col_in_two))))[[1]]

# usage: 
col_in_one <- 'a'
col_in_two <- 'b'
col_out    <- 'bah'
DT[,(col_out):=eval(expr)] # fails, should take the form j=eval(expr)

I want to keep the setup and usage stages separate, so my code is easier to maintain. My real expression is messier than this example (where it just chooses one column).

Questions

First question: How can I make the assigned-to column, "col_out", dynamic? I mean: I want to specify both "cols_in_*" and "col_out" on the fly.

I have tried creating various expressions in "expr", but as.quoted throws an error about not putting certain stuff to the left of the = symbol.

Second question: How can I avoid the warnings against using get?

The warnings suggest using .SDcols, to let [.data.table know which columns I am using. However, if I use the .SDcols argument, another warning says there's no point doing that unless .SD is being used.

Tentative solution

The solutions I have so far are...

# Ricardo + eddi:
expr2 <- as.quoted(paste(expression(`:=`(
  Vtmp=.SD[[col_in_one]]+.SD[[col_in_two]]))))[[1]]

# usage
col_in_one <- 'a'
col_in_two <- 'b'
col_out    <- 'bah'
DT[,eval(expr2),.SDcols=c(col_in_one,col_in_two)]
setnames(DT,'Vtmp',col_out)

This still involves the minor annoyance of doing the operation in two steps and keeping track of "Vtmp", so the first question is still partly open.

解决方案

Maybe I don't understand the problem well, but does this suffice:

DT[, (col_out) := .SD[[col_in_one]]+.SD[[col_in_two]],
     .SDcols = c(col_in_one,col_in_two)]
DT
#   a b y bah
#1: 0 0 2   0
#2: 0 1 2   1
#3: 1 0 2   1
#4: 1 1 2   2

To answer the edited question, to get the eval to work, use .SD as environment:

DT[, (col_out) := eval(expr, .SD)]

Also, see this question and the update there - eval and quote in data.table

这篇关于data.table`:=`赋值表达式带有动态输入(现有列)和输出(新列名)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆