R data.table用户定义函数 [英] R data.table user defined function

查看:57
本文介绍了R data.table用户定义函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在从在R中使用data.frame过渡到data.table,以获得更好的性能.转换代码的主要步骤之一是应用自定义函数,从对data.frame的应用到在data.table中的使用.

I am transitioning from using data.frame in R to data.table for better performance. One of the main segments in converting code was applying custom functions from apply on data.frame to using it in data.table.

说我有一个简单的数据表dt1.

Say I have a simple data table, dt1.

x y z---header

1 9 j

4 1 n

7 1 n

Am试图根据x,y,z的值计算dt1中的另一个新列我尝试了两种方法,两种方法都能给出正确的结果,但是更快的方法会发出警告.因此,在使用更快的版本转换现有代码之前,请确保警告没有严重的问题.

Am trying to calculate another new column in dt1, based on values of x,y,z I tried 2 ways, both of them give the correct result, but the faster one spits out a warning. So want to make sure the warning is nothing serious before I use the faster version in converting my existing code.

(1) dt1[,a:={if((x<1) & (y>3) & (j == "n")){6} else {7}}]

(2) dt1[,a:={if((x<1) & (y>3) & (j == "n")){6} else {7}}, by = 1:nrow(x)]

版本1的运行速度比版本2快,但发出警告条件的长度> 1,并且将仅使用第一个元素"但是结果是好的.第二个版本稍慢一些,但没有发出警告.我想确保一旦开始编写复杂的函数,版本一就不会产生不稳定的结果.

Version 1 runs faster than version 2, but spits out a warning" the condition has length > 1 and only the first element will be used" But the result is good. The second version is slightly slower but doesn't give that warning. I wanted to make sure version one doesn't give erratic results once I start writing complicated functions.

请将该问题视为通用问题,以便运行用户定义的函数,该函数想要访问给定行中的不同列值并计算该行的新列值.

Please treat the question as a generic one with the view to run a user defined function which wants to access different column values in a given row and calculate the new column value for that row.

感谢您的帮助.

推荐答案

如果'x','y'和'z'是'dt1'的列,请尝试使用矢量化的 ifelse

If 'x', 'y', and 'z' are the columns of 'dt1', try either the vectorized ifelse

dt1[, a:=ifelse(x<1 & y >3 & z=='n', 6, 7)] 

或使用7创建'a',然后根据逻辑索引将6分配给'a'.

Or create 'a' with 7, then assign 6 to 'a' based on the logical index.

dt1[, a := 7][x<1 & y >3 & z=='n', a:=6][]

使用功能

getnewvariable <- function(v1, v2, v3){
   ifelse(v1 <1 & v2 >3 & v3=='n', 6, 7)
}

 dt1[, a:=getnewvariable(x,y,z)][]

数据

df1 <- structure(list(x = c(0L, 1L, 4L, 7L, -2L), y = c(4L, 9L, 1L, 
1L, 5L), z = c("n", "j", "n", "n", "n")), .Names = c("x", "y", 
"z"), class = "data.frame", row.names = c(NA, -5L))

dt1 <- as.data.table(df1) 

这篇关于R data.table用户定义函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆