match.fun比R中的实际功能慢 [英] match.fun slower than actual function in R

查看：32 发布时间：2021/4/28 19:40:57 r data.table

本文介绍了match.fun比R中的实际功能慢的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有大型数据集，其中行测量的是同一事物(本质上是重复的，带有一些噪音).作为我正在编写的较大功能的一部分，我希望用户能够根据自己选择的功能(例如均值，中位数)折叠这些行.

I have large data sets with rows that measure the same thing (essentially duplicates with some noise). As part of a larger function I am writing, I want the user to be able to collapse these rows with a function of their choosing (e.g. mean, median).

我的问题是，如果我直接调用该函数，则速度要比使用match.fun(这是我所需要的)要快得多.MWE:

My problem is that if I call the function directly, speed is much faster than if I use match.fun (which is what I need). MWE:

require(data.table)

rows <- 100000
cols <- 1000
dat <- data.table(id=sample(LETTERS, rows, replace=TRUE), 
                  matrix(rnorm(rows*cols), nrow=rows))

aggFn <- "median"

system.time(dat[, lapply(.SD, median), by=id])
system.time(dat[, lapply(.SD, match.fun(aggFn)), by=id])

在我的系统上，最后2行的计时结果:

On my system, timing results for the last 2 lines:

   user  system elapsed 
  1.112   0.027   1.141 
   user  system elapsed 
  2.854   0.265   3.121

对于更大的数据集，这变得非常引人注目.

This becomes quite dramatic with larger data sets.

最后一点，我认识到aggregate()可以做到这一点(并且似乎没有受到这种行为的影响)，但是由于数据大小，我需要使用data.table对象.

As a final point, I realize aggregate() can do this (and doesn't seem to suffer from this behavior), but I need to work with data.table objects due to data size.

推荐答案

原因是gforce优化data.table用于 median .您可以看到，如果设置了 options(datatable.verbose = TRUE).有关详细信息，请参见 help("GForce").

The reason is the gforce optimization data.table does for median. You can see that if you set options(datatable.verbose=TRUE). See help("GForce") for details.

如果您比较其他功能，则会得到更多类似的计时:

If you compare for other functions you get more similar timings:

fun <- median
aggFn <- "fun"
system.time(dat[, lapply(.SD, fun), by=id])
system.time(dat[, lapply(.SD, match.fun(aggFn)), by=id])

如果碰巧支持该功能，则可以利用优化的一种可能的解决方法是使用它评估表达式构建，例如，使用可怕的 eval(parse()):

A possible workaround to utilise the optimization if the function happens to be supported would be evaluating an expression build with it, e.g., using the dreaded eval(parse()):

dat[, eval(parse(text = sprintf("lapply(.SD, %s)", aggFn))), by=id]

但是，使用 match.fun 添加后，您将失去小小的安全性.

However, you would lose the small security using match.fun adds.

如果您有用户可以选择的功能列表，则可以执行以下操作:

If you have a list of functions the users can choose from, you could do this:

funs <- list(quote(mean), quote(median))
fun <- funs[[1]] #select
expr <- bquote(lapply(.SD, .(fun)))
a <- dat[, eval(expr), by=id]

这篇关于match.fun比R中的实际功能慢的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

match.fun比R中的实际功能慢 [英] match.fun slower than actual function in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

match.fun比R中的实际功能慢 [英] match.fun slower than actual function in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭