向data.table提供参数,作为(1)字符串向量和(2)变量名 [英] Supply arguments to data.table as (1) vector of strings AND (2) variablenames
问题描述
假设您要在data.table上逐行应用函数.该函数的参数对应于固定的data.table列以及动态生成的列名.
Imagine you want to apply a function row-wise on a data.table. The function's arguments correspond to fixed data.table columns as well as dynamically generated column names.
在使用data.tables时,是否可以提供固定和动态列名作为函数的参数?
Is there a way to supply fixed and dynamic column names as argument to a function while using data.tables?
问题是:
- 变量名和动态生成的字符串均作为数据表上函数的参数
- 动态列名称字符串存储在具有> 1个条目的向量中(
get()
不起作用) - 动态列的值需要作为向量提供给函数
这说明了这一点:
library('data.table')
# Sample dataframe
D <- data.table(id=1:3, fix=1:3, dyn1=1:3, dyn2=1:3) #fixed and dynamic column names
setkey(D, id)
# Sample function
foo <-function(fix, dynvector){ rep(fix,length(dynvector)) %*% dynvector}
# It does not matter what this function does.
# The result when passing column names not dynamically
D[, "new" := foo(fix,c(dyn1,dyn2)), by=id]
# id fix dyn1 dyn2 new
# 1: 1 1 1 1 2
# 2: 2 2 2 2 8
# 3: 3 3 3 3 18
我想摆脱 c(dyn1,dyn2)
.我需要从另一个将其保存为字符串的向量中获取列名称dyn1,dyn2.
I want to get rid of the c(dyn1,dyn2)
. I need to get the column names dyn1, dyn2 from another vector which holds them as string.
这是我走了多远:
# Now we try it dynamically
cn <-paste("dyn",1:2,sep="") #vector holding column names "dyn1", "dyn2"
# Approaches that don't work
D[, "new" := foo(fix,c(cn)), by=id] #wrong as using a mere string
D[, "new" := foo(fix,c(cn)), by=id, with=F] #does not work
D[, "new" := foo(fix,c(get(cn))), by=id] #uses only the first element "dyn1"
D[, "new" := foo(fix,c(mget(cn, .GlobalEnv, inherits=T))), by=id] #does not work
D[, "new" := foo(fix,c(.SD)), by=id, .SDcols=cn] #does not work
我想 mget()
是解决方案,但是我对范围界定的了解不多.
I suppose mget()
is the solution, but I know too less about scoping to figure it out.
谢谢!JBJ
更新:解决方案
基于BondedDust的回答
based on the answer by BondedDust
D[, "new" := foo(fix,sapply(cn, function(x) {get(x)})), by=id]
推荐答案
我无法弄清楚您要对矩阵乘法进行的操作,但这显示了如何使用变化的和固定的输入来创建新变量功能:
I wasn't able to figure out what you were trying to do with the matrix-multiplication, but this shows how to create new variables with varying and fixed inputs to a function:
D <- data.table(id=1:3, fix=1:3, dyn1=1:3, dyn2=1:3)
setkey(id)
foo <-function(fix, dynvector){ fix* dynvector}
D[, paste("new",1:2,sep="_") := lapply( c(dyn1,dyn2), foo, fix=fix), by=id]
#----------
> D
id fix dyn1 dyn2 new_1 new_2
1: 1 1 1 1 1 1
2: 2 2 2 2 4 4
3: 3 3 3 3 9 9
因此,您需要使用字符值向量来 get
列.这是对该问题的扩展:
So you need to use a vector of character values to get
columns. This is a bit of an extension to this question: Why do I need to wrap `get` in a dummy function within a J `lapply` call?
> D <- data.table(id=1:3, fix=1:3, dyn1=1:3, dyn2=1:3)
> setkey(D, id)
> id1 <- parse(text=cn)
> foo <-function( fix, dynvector){ fix*dynvector}
> D[, paste("new",1:2,sep="_") := lapply( sapply( cn, function(x) {get(x)}) , foo, fix=fix) ]
Warning message:
In `[.data.table`(D, , `:=`(paste("new", 1:2, sep = "_"), lapply(sapply(cn, :
Supplied 2 columns to be assigned a list (length 6) of values (4 unused)
> D
id fix dyn1 dyn2 new_1 new_2
1: 1 1 1 1 1 2
2: 2 2 2 2 2 4
3: 3 3 3 3 3 6
您可能会使用中的方法还可以从data.table到eval 的函数中创建一个表达式.
You could probably use the methods in create an expression from a function for data.table to eval as well.
这篇关于向data.table提供参数,作为(1)字符串向量和(2)变量名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!