data.table的包装函数 [英] Wrapper functions for data.table

查看:77
本文介绍了data.table的包装函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个已经使用data.frame上下文编写的项目。为了缩短计算时间,我尝试利用data.table的速度来代替。为此,我的方法是构造包装函数,以读取框架,将其转换为表格,进行计算,然后转换回框架。这是简单的示例之一...

I have a project that has already been written using context of data.frame. In order to improve calc times I'm trying to leverage the speed of using data.table instead. My methodology for this has been to construct wrapper functions that read in frames, convert them to tables, do the calculations and then convert back to frames. Here's one of the simple examples...

FastAgg<-function(x, FUN, aggFields, byFields = NULL, ...){
  require('data.table')
  y<-setDT(x)
  y<-y[,lapply(X=.SD,FUN=FUN,...),.SDcols = aggFields,by=byFields]
  y<-data.frame(y)
  y
}

我遇到的问题是,运行此函数x之后,已将x转换为表,然后使用data.frame表示法编写的代码行失败。我该如何确保正在运行的函数不会馈入我提供的data.frame?

The problem I'm having is that after running this function x has been converted to a table and then lines of code that I have written using data.frame notation fail. How do I make sure that the data.frame I feed in is unchanged by the running function?

推荐答案

对于您来说,我d(当然)建议使用 data.table 而不是仅在函数中使用:-)。

For your case, I'd recommend (of course) to use data.table through out and not just in a function :-).

但是如果不太可能发生,那么我建议使用 setDT + setDF 设置。我建议在函数外使用 setDT (并提供data.table作为输入)-通过引用将data.frame转换为data.table,然后完成所需的操作后,可以使用 setDF 使用 setDF 将结果转换回data.frame并从函数返回。但是,执行 setDT(x)会将 x 更改为data.table-因为它通过引用操作。

But if it's not likely to happen, then I'd recommend the setDT + setDF setup. I'd recommend using setDT outside the function (and provide the data.table as input) - to convert your data.frame to a data.table by reference, and then after finishing the operations you'd like, you can use setDF to convert the result back to a data.frame using setDF and return that from the function. However, doing setDT(x) changes x to a data.table - as it operates by reference.

如果这不理想,请在函数中使用 as.data.table(。),因为它在复制。然后,您仍然可以使用 setDF()将结果data.table转换为data.frame并从函数中返回该data.frame。

If that is not ideal, then use as.data.table(.) inside your function, as it operates on a copy. Then, you can still use setDF() to convert the resulting data.table to data.frame and return that data.frame from your function.

最近引入了这些功能(主要是由于用户请求)。避免这种混淆的想法是导出 shallow()函数并跟踪需要复制列的对象,并在内部(自动)完成所有操作。现在一切都还处于早期阶段。当我们管理好之后,我将更新这篇文章。

These functions are recently introduced (mostly due to user requests). The idea to avoid this confusion is to export shallow() function and keep track of objects that require columns to be copied, and do it all internally (and automatically). It's all in very early stages right now. When we've managed, I'll update this post.

也可以看看复制 setDT setDF 。这些功能的帮助页面的第一段是:

Also have a look at ?copy, ?setDT and ?setDF. The first paragraph in these function's help page is:


data.table 中,所有 set * 函数均通过引用更改其输入。也就是说,除了临时工作内存(它只有一个列一样大)外,根本不进行任何复制。.唯一的其他 data.table 运算符可通过引用修改输入是:= 。请查看下面的另请参见部分,以获取表提供的其他 set * 函数数据。

In data.table parlance, all set* functions change their input by reference. That is, no copy is made at all, other than temporary working memory, which is as large as one column.. The only other data.table operator that modifies input by reference is :=. Check out the See Also section below for other set* function data.table provides.

以及 setDT 的示例:

set.seed(45L)
X = data.frame(A=sample(3, 10, TRUE), 
         B=sample(letters[1:3], 10, TRUE), 
         C=sample(10), stringsAsFactors=FALSE)

# get the frequency of each "A,B" combination
setDT(X)[, .N, by="A,B"][]

没有赋值(尽管我承认

setDF 中:

X = data.table(x=1:5, y=6:10)
## convert 'X' to data.frame, without any copy.
setDF(X)

我认为这很清楚。但是,我将尝试提供更多的清晰度。另外,我还将尝试在文档中添加如何最好地使用这些功能。

I think this is pretty clear. But I'll try to provide more clarity. Also, I'll try and add how best to use these functions in the documentation as well.

这篇关于data.table的包装函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆