在data.table()中使用条件函数分配多个列 [英] Assigning multiple columns in data.table() with conditional function
问题描述
在上一个问题中,在dplyr mutate中返回列表)它是clairified虽然dlpyr不能在版本0.2创建新的变量从一个函数返回的向量,data.table()可以与语法 - :
it [,c(paste0(V,4:5)):= myfun(V2,V3)]
如果该问题的函数 myfun
更改为 - :
myfun = function(arg1,arg2){
if(arg1> arg2){
temp1 = arg1 + arg2
temp2 = arg1 - arg2}
else {
temp1 = arg1 * arg2
temp2 = arg1 / arg2}
list(temp1,temp2)
}
上述解决方案会返回警告 - :
it = data.table(c(a,a,b,b,c),c ,2,3,4,5),c(2,3,4,2,2))
it [,c(paste0(V,4:5)):= myfun )]
警告消息:
在if(arg1> arg2){:
条件的长度> 1,只使用第一个元素
这意味着data.table比单行的功能。为什么会发生这种情况?
data.table
始终传递完整列(除非您使用通过
,在这种情况下,您得到列的一部分对应于每个子组)。为了解决这个问题,你需要将你的函数向量化:
myfun2 = function(arg1,arg2){
temp1 <-ifelse(arg1> arg2,arg1 + arg2,arg1 * arg2)
temp2 arg2,arg1-arg2,arg1 / arg2)
list ,temp2)
}
我这里使用 ifelse
而不是 if / else
。然后它的工作原理:
it = data.table(c(a,a,b,b ,c),c(1,2,3,4,5),c(2,3,4,2,2))
pre>
it [,c(paste0 5)):= myfun2(V2,V3)]
it
#V1 V2 V3 V4 V5
#1:a 1 2 2 0.5000000
#2:a 2 3 6 0.6666667
#3:b 3 4 12 0.7500000
#4:b 4 2 6 2.0000000
#5:c 5 2 7 3.0000000
另一个替代方法,如果你不想修改你的函数,就是分解
data.table
成一个行组。我们通过向传递一个向
的向量,它对
data.table
中的每一行都有一个不同的值每一行都是一个组):it [,c(paste0(V,4:5)): myfun(V2,V3),by = 1:nrow(it)]
$ c>由参数。这也工作,但是更慢。一般来说,如果你可以矢量化你应该。
In a previous question Return a list in dplyr mutate() it was clairified that although dlpyr cannot in release 0.2 create new variables from a vector returned by a function, data.table() can with the syntax -:
it[, c(paste0("V", 4:5)) := myfun(V2, V3)]
If the function
myfun
from that question is altered to -:myfun = function(arg1,arg2) { if (arg1 > arg2) { temp1 = arg1 + arg2 temp2 = arg1 - arg2 } else { temp1 = arg1 * arg2 temp2 = arg1 / arg2 } list(temp1,temp2) }
the solution posted above returns a warning -:
it = data.table(c("a","a","b","b","c"),c(1,2,3,4,5), c(2,3,4,2,2)) it[, c(paste0("V", 4:5)) := myfun(V2, V3)] Warning message: In if (arg1 > arg2) { : the condition has length > 1 and only the first element will be used
This implies that somehow data.table() is passing more than a single row to the function. Why is this occurring?
解决方案Ron, this is expected behavior.
data.table
always passes the full columns (unless you useby
, in which case you get the part of the column that corresponds to each sub group). In order to get around this, you need to vectorize your function:myfun2 = function(arg1,arg2) { temp1 <- ifelse(arg1 > arg2, arg1 + arg2, arg1 * arg2) temp2 <- ifelse(arg1 > arg2, arg1 - arg2, arg1 / arg2) list(temp1,temp2) }
I do this here using
ifelse
instead ofif/else
. Then it works:it = data.table(c("a","a","b","b","c"),c(1,2,3,4,5), c(2,3,4,2,2)) it[, c(paste0("V", 4:5)) := myfun2(V2, V3)] it # V1 V2 V3 V4 V5 # 1: a 1 2 2 0.5000000 # 2: a 2 3 6 0.6666667 # 3: b 3 4 12 0.7500000 # 4: b 4 2 6 2.0000000 # 5: c 5 2 7 3.0000000
Another alternative, if you don't want to modify your function, is to break up the
data.table
into one row groups. We do this by passing a vector toby
that has a distinct value for each row in thedata.table
(so that each row is a group):it[, c(paste0("V", 4:5)) := myfun(V2, V3), by=1:nrow(it)]
Notice the
by
argument. This also works, but is slower. Generally, if you can vectorize you should.这篇关于在data.table()中使用条件函数分配多个列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!