match.fun比R中的实际功能慢 [英] match.fun slower than actual function in R

查看:32
本文介绍了match.fun比R中的实际功能慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有大型数据集,其中行测量的是同一事物(本质上是重复的,带有一些噪音).作为我正在编写的较大功能的一部分,我希望用户能够根据自己选择的功能(例如均值,中位数)折叠这些行.

I have large data sets with rows that measure the same thing (essentially duplicates with some noise). As part of a larger function I am writing, I want the user to be able to collapse these rows with a function of their choosing (e.g. mean, median).

我的问题是,如果我直接调用该函数,则速度要比使用match.fun(这是我所需要的)要快得多.MWE:

My problem is that if I call the function directly, speed is much faster than if I use match.fun (which is what I need). MWE:

require(data.table)

rows <- 100000
cols <- 1000
dat <- data.table(id=sample(LETTERS, rows, replace=TRUE), 
                  matrix(rnorm(rows*cols), nrow=rows))

aggFn <- "median"

system.time(dat[, lapply(.SD, median), by=id])
system.time(dat[, lapply(.SD, match.fun(aggFn)), by=id])

在我的系统上,最后2行的计时结果:

On my system, timing results for the last 2 lines:

   user  system elapsed 
  1.112   0.027   1.141 
   user  system elapsed 
  2.854   0.265   3.121 

对于更大的数据集,这变得非常引人注目.

This becomes quite dramatic with larger data sets.

最后一点,我认识到aggregate()可以做到这一点(并且似乎没有受到这种行为的影响),但是由于数据大小,我需要使用data.table对象.

As a final point, I realize aggregate() can do this (and doesn't seem to suffer from this behavior), but I need to work with data.table objects due to data size.

推荐答案

原因是gforce优化data.table用于 median .您可以看到,如果设置了 options(datatable.verbose = TRUE).有关详细信息,请参见 help("GForce").

The reason is the gforce optimization data.table does for median. You can see that if you set options(datatable.verbose=TRUE). See help("GForce") for details.

如果您比较其他功能,则会得到更多类似的计时:

If you compare for other functions you get more similar timings:

fun <- median
aggFn <- "fun"
system.time(dat[, lapply(.SD, fun), by=id])
system.time(dat[, lapply(.SD, match.fun(aggFn)), by=id])

如果碰巧支持该功能,则可以利用优化的一种可能的解决方法是使用它评估表达式构建,例如,使用可怕的 eval(parse()):

A possible workaround to utilise the optimization if the function happens to be supported would be evaluating an expression build with it, e.g., using the dreaded eval(parse()):

dat[, eval(parse(text = sprintf("lapply(.SD, %s)", aggFn))), by=id]

但是,使用 match.fun 添加后,您将失去小小的安全性.

However, you would lose the small security using match.fun adds.

如果您有用户可以选择的功能列表,则可以执行以下操作:

If you have a list of functions the users can choose from, you could do this:

funs <- list(quote(mean), quote(median))
fun <- funs[[1]] #select
expr <- bquote(lapply(.SD, .(fun)))
a <- dat[, eval(expr), by=id]

这篇关于match.fun比R中的实际功能慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆