在data.table()中使用条件函数分配多个列 [英] Assigning multiple columns in data.table() with conditional function

查看:121
本文介绍了在data.table()中使用条件函数分配多个列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在上一个问题中,在dplyr mutate中返回列表)它是clairified虽然dlpyr不能在版本0.2创建新的变量从一个函数返回的向量,data.table()可以与语法 - :

  it [,c(paste0(V,4:5)):= myfun(V2,V3)] 

如果该问题的函数 myfun 更改为 - :

  myfun = function(arg1,arg2){


if(arg1> arg2){
temp1 = arg1 + arg2
temp2 = arg1 - arg2}
else {
temp1 = arg1 * arg2
temp2 = arg1 / arg2}
list(temp1,temp2)

}

上述解决方案会返回警告 - :

  it = data.table(c(a,a,b,b,c),c ,2,3,4,5),c(2,3,4,2,2))
it [,c(paste0(V,4:5)):= myfun )]

警告消息:
在if(arg1> arg2){:
条件的长度> 1,只使用第一个元素

这意味着data.table比单行的功能。为什么会发生这种情况?

解决方案

data.table 始终传递完整列(除非您使用通过,在这种情况下,您得到列的一部分对应于每个子组)。为了解决这个问题,你需要将你的函数向量化:

  myfun2 = function(arg1,arg2){
temp1 <-ifelse(arg1> arg2,arg1 + arg2,arg1 * arg2)
temp2 arg2,arg1-arg2,arg1 / arg2)
list ,temp2)
}



我这里使用 ifelse 而不是 if / else 。然后它的工作原理:

  it = data.table(c(a,a,b,b ,c),c(1,2,3,4,5),c(2,3,4,2,2))
it [,c(paste0 5)):= myfun2(V2,V3)]
it
#V1 V2 V3 V4 V5
#1:a 1 2 2 0.5000000
#2:a 2 3 6 0.6666667
#3:b 3 4 12 0.7500000
#4:b 4 2 6 2.0000000
#5:c 5 2 7 3.0000000
pre>

另一个替代方法,如果你不想修改你的函数,就是分解 data.table 成一个行组。我们通过向传递一个向的向量,它对 data.table 中的每一行都有一个不同的值每一行都是一个组):

  it [,c(paste0(V,4:5)): myfun(V2,V3),by = 1:nrow(it)] 

$ c>由参数。这也工作,但是更慢。一般来说,如果你可以矢量化你应该。


In a previous question Return a list in dplyr mutate() it was clairified that although dlpyr cannot in release 0.2 create new variables from a vector returned by a function, data.table() can with the syntax -:

it[, c(paste0("V", 4:5)) := myfun(V2, V3)]

If the function myfun from that question is altered to -:

myfun = function(arg1,arg2) {


if (arg1 > arg2) {
temp1 = arg1 + arg2
temp2 = arg1 - arg2 }
else {
temp1 = arg1 * arg2
temp2 = arg1 / arg2 }
list(temp1,temp2)

}

the solution posted above returns a warning -:

it = data.table(c("a","a","b","b","c"),c(1,2,3,4,5), c(2,3,4,2,2))
it[, c(paste0("V", 4:5)) := myfun(V2, V3)]

Warning message:
In if (arg1 > arg2) { :
  the condition has length > 1 and only the first element will be used

This implies that somehow data.table() is passing more than a single row to the function. Why is this occurring?

解决方案

Ron, this is expected behavior. data.table always passes the full columns (unless you use by, in which case you get the part of the column that corresponds to each sub group). In order to get around this, you need to vectorize your function:

myfun2 = function(arg1,arg2) {
  temp1 <- ifelse(arg1 > arg2, arg1 + arg2, arg1 * arg2)
  temp2 <- ifelse(arg1 > arg2, arg1 - arg2, arg1 / arg2)
  list(temp1,temp2)
}

I do this here using ifelse instead of if/else. Then it works:

it = data.table(c("a","a","b","b","c"),c(1,2,3,4,5), c(2,3,4,2,2))
it[, c(paste0("V", 4:5)) := myfun2(V2, V3)]
it
#    V1 V2 V3 V4        V5
# 1:  a  1  2  2 0.5000000
# 2:  a  2  3  6 0.6666667
# 3:  b  3  4 12 0.7500000
# 4:  b  4  2  6 2.0000000
# 5:  c  5  2  7 3.0000000

Another alternative, if you don't want to modify your function, is to break up the data.table into one row groups. We do this by passing a vector to by that has a distinct value for each row in the data.table (so that each row is a group):

it[, c(paste0("V", 4:5)) := myfun(V2, V3), by=1:nrow(it)]

Notice the by argument. This also works, but is slower. Generally, if you can vectorize you should.

这篇关于在data.table()中使用条件函数分配多个列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆