微基准测试结果检查失败,data.table被引用更改 [英] Check of microbenchmark results fails with data.table changed by reference
问题描述
在SO上有一些答案,其中比较了时间而没有检查结果。但是,我更喜欢查看表达式和是否正确。
There are some answers on SO where timings are compared without checking the results. However, I prefer to see whether an expression is correct and fast.
microbenchmark
软件包通过 check
参数支持此功能。不幸的是,对于通过引用更改 data.table
的表达式,检查失败,即,检查无法识别结果是不同的。
The microbenchmark
package supports this with the check
parameter. Unfortunately, the check fails on expressions which change a data.table
by reference, i.e., the check does not recognize that results are different.
library(data.table)
library(microbenchmark)
# minimal data.table 1 col, 3 rows
dt <- data.table(x = c(1, 1, 10))
# define check function as in example section of help(microbenchmark)
my_check <- function(values) {
all(sapply(values[-1], function(x) identical(values[[1]], x)))
}
基准案例旨在返回不同的结果。因此,
The benchmark cases are designed to return different results. Thus,
microbenchmark(
f1 = dt[, mean(x)],
f2 = dt[, median(x)],
check = my_check
)
返回预期的错误消息:
错误:输入表达式不相同。
Error: Input expressions are not equivalent.
情况2:检查失败的data.table表达式
现在,修改表达式以更改 dt
。请注意,使用了相同的检查功能。
Case 2: data.table expressions where check fails
Now, the expressions are modified to change dt
by reference. Please, note that the same check function is used.
microbenchmark(
f1 = dt[, y := mean(x)],
f2 = dt[, y := median(x)],
check = my_check
)
立即返回
expr min lq mean median uq max neval cld
f1 576.947 625.174 642.9820 640.7110 661.1870 732.391 100 a
f2 602.022 658.384 684.7076 678.9975 694.0825 978.600 100 b
因此,尽管两个表达式 不同,但此处的结果检查失败。 (时间无关紧要。)
So, the check on the results has failed here although the two expressions are different. (Timings are irrelevant.)
我了解检查被确定为失败,因为 dt
已被更改。因此,当比较每个表达式的结果时,在最后一次更改的状态中始终引用同一对象。
I understand that the check is determined to fail because dt
is changed by reference. So, when comparing the result of each expression always the same object is referenced in the state of the last change.
如何修改检查函数和/或表达式,以便可靠地检查检查即使在通过引用更改 data.table
的情况下,结果也会有所不同?
How can I modify the check function and/or the expressions so that the check reliably will detect differing results even in case of a data.table
being changed by reference?
推荐答案
最简单的方法是使用 copy()
:
The simplest way is to use copy()
:
microbenchmark(
f1 = copy(dt)[, y := mean(x)],
f2 = copy(dt)[, y := median(x)],
check = my_check, times=1L
)
# Error: Input expressions are not equivalent.
添加 copy(dt)
会给出有关复制所花费时间的想法(如有必要,始终可以从运行时中减去 f1
和 f2
)。
Adding copy(dt)
to the mix would give an idea on the time spent on copying (and if necessary, one could always subtract that from the runtimes for f1
and f2
).
microbenchmark(
f1 = copy(dt)[, y := mean(x)],
f2 = copy(dt)[, y := median(x)],
f3 = copy(dt),
times=10L
)
# Unit: microseconds
# expr min lq mean median uq max neval cld
# f1 298.690 306.508 331.6364 315.1400 347.788 414.264 10 b
# f2 319.075 322.475 373.3873 329.3895 336.268 746.134 10 b
# f3 19.180 19.750 28.3504 25.1745 26.111 70.016 10 a
这篇关于微基准测试结果检查失败,data.table被引用更改的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!