为什么在`j.`中求值比`data.table`中的`$`要快？ [英] Why is it faster to evaluate in `j` than with `$` in a `data.table`?

查看：124 发布时间：2017/3/12 11:09:07 r data.table

本文介绍了为什么在`j.`中求值比`data.table`中的`$`要快？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

也许这已经回答了，我错过了，但很难搜索。

一个很简单的问题：为什么 dt [ x] 一般比 dt $ x 吗？

/ p>

  dt <-data.table（id = 1：1e7，var = rnorm（1e6））
 $ b b test< -microbenchmark（times = 100L，
 dt [sample（1e7，size = 200000），var]，
 dt [sample（1e7，size = 200000），] $ var）$ b b 
 test [，expr]< -c（in j，$）
 
单位：毫秒
 expr min lq平均中位数uq max neval 
 $ 14.28863 15.88779 18.84229 17.23109 18.41577 53.63473 100 
 in j 14.35916 15.97063 18.87265 17.99266 18.37939 54.19944 100

无论如何，在 j

中评估， code>在至少75％的时间更快（虽然似乎有一个胖的上尾，因为平均值更高;如果 microbenchmark 可能会吐出一些直方图）。
 
 
 为什么是这种情况？ 
 
解决方案
使用 j   < 
 
 $ b   （和你的调用），你是 [。data.table 中的子集，然后选择 $  
 
 
 你实际上是调用2个函数而不是1，因此时间上有一个可以忽略的差异。
 
 
  
 
 
 比较返回相同结果
  dt <-data.table（id = 1：1e7，var = rnorm（1e6））
 setkey（dt，id）
 ii < = 200000）
 
 
 microbenchmark（in j= dt [。（ii），var]，$= dt [。（ii）] $ var，'[[' = dt [。（ii）] [['var']]，.subset2（dt [。（ii）]，'var'），dt [。（ii）] [[2]]，dt [['var ']] [ii]，dt $ var [ii]，.subset2（dt，'var'）[ii]）
单位：毫秒
 expr min lq平均中位数uq max neval cld 
 in j 39.491156 40.358669 41.570057 40.860342 41.485622 70.202441 100 b 
 $ 39.957211 40.561965 41.587420 41.136836 41.634584 69.928363 100 b 
 [[40.046558 40.515480 42.388432 41.244444 41.750946 72.224827 100 b 
 .subset2（dt [。 ）]，var）39.772781 40.564077 41.561271 41.111630 41.635489 69.252222 100 b 
 dt [。（ii）] [[2]] 40.004300 40.513669 41.682526 40.927503 41.492866 72.986995 100 b 
 dt [[var] ] [ii] 4.432346 4.546898 4.946219 4.623416 4.755777 31.761115 100 a 
 dt $ var [ii] 4.440496 4.539502 4.668361 4.597457 4.729214 5.425125 100 a 
 .subset2（dt，var）[ii] 4.365939 4.508261 4.660435 4.598815 4.703858 6.072289 100 a 
  
 
Perhaps this is already answered and I missed it, but it's hard to search.

A very simple question: Why is dt[,x] generally a tiny bit faster than dt$x?

Example:
dt<-data.table(id=1:1e7,var=rnorm(1e6))

test<-microbenchmark(times=100L,
                     dt[sample(1e7,size=200000),var],
                     dt[sample(1e7,size=200000),]$var)

test[,"expr"]<-c("in j","$")

Unit: milliseconds
 expr      min       lq     mean   median       uq      max neval
    $ 14.28863 15.88779 18.84229 17.23109 18.41577 53.63473   100
 in j 14.35916 15.97063 18.87265 17.99266 18.37939 54.19944   100
I might not have chosen the best example, so feel free to suggest something perhaps more poignant.

Anyway, evaluating in j is faster at least 75% of the time (though there appears to be a fat upper tail as the mean is higher; side note, it would be nice if microbenchmark could spit me out some histograms).

Why is this the case?
 解决方案 
With j, you are subsetting and selecting within a call to [.data.table.

With $ (and your call), you are subsetting within [.data.table and then selecting with $

You are in essence calling 2 functions not 1, thus there is a neglible difference in timing.

In your current example you are calling `sampling(1e,200000) each time.

For comparison to return identical results
dt<-data.table(id=1:1e7,var=rnorm(1e6))
setkey(dt, id)
ii <- sample(1e7,size=200000)


microbenchmark("in j" = dt[.(ii),var], "$"=dt[.(ii)]$var, '[[' =dt[.(ii)][['var']], .subset2(dt[.(ii)],'var'), dt[.(ii)][[2]], dt[['var']][ii], dt$var[ii], .subset2(dt,'var')[ii] )
Unit: milliseconds
                       expr       min        lq      mean    median        uq       max neval cld
                       in j 39.491156 40.358669 41.570057 40.860342 41.485622 70.202441   100   b
                          $ 39.957211 40.561965 41.587420 41.136836 41.634584 69.928363   100   b
                         [[ 40.046558 40.515480 42.388432 41.244444 41.750946 72.224827   100   b
 .subset2(dt[.(ii)], "var") 39.772781 40.564077 41.561271 41.111630 41.635489 69.252222   100   b
             dt[.(ii)][[2]] 40.004300 40.513669 41.682526 40.927503 41.492866 72.986995   100   b
            dt[["var"]][ii]  4.432346  4.546898  4.946219  4.623416  4.755777 31.761115   100  a 
                 dt$var[ii]  4.440496  4.539502  4.668361  4.597457  4.729214  5.425125   100  a 
    .subset2(dt, "var")[ii]  4.365939  4.508261  4.660435  4.598815  4.703858  6.072289   100  a 


                        
这篇关于为什么在`j.`中求值比`data.table`中的`$`要快？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

为什么在`j.`中求值比`data.table`中的`$`要快？ [英] Why is it faster to evaluate in `j` than with `$` in a `data.table`?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

为什么在`j.`中求值比`data.table`中的`$`要快？ [英] Why is it faster to evaluate in `j` than with `$` in a `data.table`?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭