微基准测试结果检查失败,data.table被引用更改 [英] Check of microbenchmark results fails with data.table changed by reference

查看:86
本文介绍了微基准测试结果检查失败,data.table被引用更改的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在SO上有一些答案,其中比较了时间而没有检查结果。但是,我更喜欢查看表达式是否正确。

There are some answers on SO where timings are compared without checking the results. However, I prefer to see whether an expression is correct and fast.

microbenchmark 软件包通过 check 参数支持此功能。不幸的是,对于通过引用更改 data.table 的表达式,检查失败,即,检查无法识别结果是不同的。

The microbenchmark package supports this with the check parameter. Unfortunately, the check fails on expressions which change a data.table by reference, i.e., the check does not recognize that results are different.

library(data.table)
library(microbenchmark)

# minimal data.table 1 col, 3 rows
dt <- data.table(x = c(1, 1, 10))

# define check function as in example section of help(microbenchmark)
my_check <- function(values) {
  all(sapply(values[-1], function(x) identical(values[[1]], x)))
}

基准案例旨在返回不同的结果。因此,

The benchmark cases are designed to return different results. Thus,

microbenchmark(
  f1 = dt[, mean(x)],
  f2 = dt[, median(x)],
  check = my_check
)

返回预期的错误消息:


错误:输入表达式不相同。

Error: Input expressions are not equivalent.



情况2:检查失败的data.table表达式



现在,修改表达式以更改 dt 。请注意,使用了相同的检查功能。

Case 2: data.table expressions where check fails

Now, the expressions are modified to change dt by reference. Please, note that the same check function is used.

microbenchmark(
  f1 = dt[, y := mean(x)],
  f2 = dt[, y := median(x)],
  check = my_check
)

立即返回

 expr     min      lq     mean   median       uq     max neval cld
   f1 576.947 625.174 642.9820 640.7110 661.1870 732.391   100  a 
   f2 602.022 658.384 684.7076 678.9975 694.0825 978.600   100   b

因此,尽管两个表达式 不同,但此处的结果检查失败。 (时间无关紧要。)

So, the check on the results has failed here although the two expressions are different. (Timings are irrelevant.)

我了解检查被确定为失败,因为 dt 已被更改。因此,当比较每个表达式的结果时,在最后一次更改的状态中始终引用同一对象。

I understand that the check is determined to fail because dt is changed by reference. So, when comparing the result of each expression always the same object is referenced in the state of the last change.

如何修改检查函数和/或表达式,以便可靠地检查检查即使在通过引用更改 data.table 的情况下,结果也会有所不同?

How can I modify the check function and/or the expressions so that the check reliably will detect differing results even in case of a data.table being changed by reference?

推荐答案

最简单的方法是使用 copy()

The simplest way is to use copy():

microbenchmark(
    f1 = copy(dt)[, y := mean(x)],
    f2 = copy(dt)[, y := median(x)],
    check = my_check, times=1L
)
# Error: Input expressions are not equivalent.

添加 copy(dt)会给出有关复制所花费时间的想法(如有必要,始终可以从运行时中减去 f1 f2 )。

Adding copy(dt) to the mix would give an idea on the time spent on copying (and if necessary, one could always subtract that from the runtimes for f1 and f2).

microbenchmark(
    f1 = copy(dt)[, y := mean(x)],
    f2 = copy(dt)[, y := median(x)],
    f3 = copy(dt),
    times=10L
)
# Unit: microseconds
#  expr     min      lq     mean   median      uq     max neval cld
#    f1 298.690 306.508 331.6364 315.1400 347.788 414.264    10   b
#    f2 319.075 322.475 373.3873 329.3895 336.268 746.134    10   b
#    f3  19.180  19.750  28.3504  25.1745  26.111  70.016    10   a 

这篇关于微基准测试结果检查失败,data.table被引用更改的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆