更快的i，j矩阵单元格填充 [英] Faster i, j matrix cell fill

查看：154 发布时间：2017/3/12 11:10:10 r performance data.table dplyr

本文介绍了更快的i，j矩阵单元格填充的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想要获取data.frame / matrix的列，并在数据帧的每个单元格（ [i，j] ）之间应用一个函数，其中i和j是沿着data.frame的列的序列。基本上，我想用 cor 函数与data.frame一起填充单个单元格的矩阵。

我今天在data.table网页上看到（ http://datatable.r-forge.r-project.org/ ）以下内容：

比 500+倍> DF [i，j] <值

这让我想到 data.table 或 dplyr 或其他方式可能会加快一点。我的大脑已经固定在填充细胞，但也许有一个更好的方法涉及重塑，应用功能和重塑到矩阵或沿着这些线条的东西。我可以在基本R中使用 outer 或为循环，如下所示。

  ##任意函数
 FUN < -  function（x，y）round（sqrt（sum（x）） -  sum = 1）
 
 ##外部方法
 outer（
 names（mtcars），
 names（mtcars），
 Vectorize ）对于方法
 mat < -  matrix（rep（NA，ncol（mtcars（mtcars ）对于（i in 1：ncol（mtcars））{
 for（j in 1：ncol（mtcars））{
 mat [i， j] < -  FUN（mtcars [，i]，mtcars [，j]）
} 
} 
 mat

这里是微基准测试 $>

 
 
  ：milliseconds 
 expr min lq median uq max neval 
 OUTER（）4.450410 4.691124 4.774394 4.877724 55.77333 1000 
 FOR（）4.309527 4.521785 4.588728 4.694156 7.04275 1000 
   R中最快的方法是什么？ 
解决方案
仍然坚持 base  R解决方案，基于的方法的中的1.7倍加速：
 
 
  
  $ c> [，i]   [[i]]  $ c>刚接收到C指针而不是新分配的向量）; 
 
   FUN 的字节码编译; 
 
 将用于代码添加到函数+字节代码编译（小时间影响）; 
 
 
 
 
  BTW，在2个循环中交换索引（i，j） - >（j，i）没有导致明显的差异（理论上，应该更快）。
 
 
 代码：

 
 FUN2<  -  cmpfun（FUN）
 for2<  -  cmpfun（function（mtcars，FUN）{
 mat< matrix（rep（NA，ncol（mtcars）^ 2） ，ncol（mtcars））
 for（i in 1：ncol（mtcars））{
 for（j in 1：ncol（mtcars））{
 mat [i，j] -  FUN（mtcars [[i]]，mtcars [[j]]）
} 
} 
 mat 
}）
  / pre> 
 
 基准：
 单位：毫秒
 min lq median uq max neval 
 outer 7.791739 7.991474 8.245869 8.538163 16.24460 100 
 for 8.143679 8.463249 8.588230 9.912008 16.30842 100 
 for -mods 4.713837 4.875972 5.006202 5.246584 15.66491 100 
  
在我看来，很难找到一个更快的方法（但我可能是错的）。与计算 FUN 多次所需的时间相比， for 循环时间偏差相当小（约0.25 ms） 。
 
I want to take columns of a data.frame/matrix and apply a function to between each cell ([i, j]) of the dataframe where i and j are the sequences along the columns of the data.frame.  Basically I want to fill a matrix of individual cells in the same way that the cor function works with a data.frame.

This is a related question: Create a matrix from a function and two numeric data frames  However, I use this in randomization tests and repeat the operation many times (make many matrices).  I'm looking for the fastest way to do this operation.  I have sped things up a bit using parallel processing but I'm still not happy with this speed.  It can not be assumed that the matrix output is symmetrical either, that is in the way cor produces a symmetrical matrix (my example will reflect this).  

I saw on the data.table web page today (http://datatable.r-forge.r-project.org/) the following:

  500+ times faster than DF[i,j]<-value
This got me thinking that perhaps data.table or dplyr or other means may speed things up a bit.  My brain has been fixed on filling cells but maybe there's a better way involving reshaping, applying the function and reshaping to a matrix or something along those lines.  I can achieve this in base R using outer or a for loop as follows.
## Arbitrary function
FUN <- function(x, y) round(sqrt(sum(x)) - sum(y), digits=1)

## outer approach
outer(
  names(mtcars), 
  names(mtcars), 
  Vectorize(function(i,j) FUN(mtcars[,i],mtcars[,j]))
)

## for approach
mat <- matrix(rep(NA, ncol(mtcars)^2), ncol(mtcars))
for (i in 1:ncol(mtcars)) {
    for (j in 1:ncol(mtcars)) {
        mat[i, j] <- FUN(mtcars[, i], mtcars[, j])
    }
}
mat
Here are the microbenchmark timings with for getting a slight edge.
Unit: milliseconds
    expr      min       lq   median       uq      max neval
 OUTER() 4.450410 4.691124 4.774394 4.877724 55.77333  1000
   FOR() 4.309527 4.521785 4.588728 4.694156  7.04275  1000
What is the fastest approach to this in R (add on packages welcomed)?  
 解决方案 
Still sticking to base R solution, I got a 1.6-1.7x speedup in the for-based approach by:


substituting [,i] for [[i]] (significant time impact - perhaps FUN just receives C pointers here instead of freshly allocated vectors);
byte-code compiling of FUN (small time impact);
wrapping for code to a function + byte-code compilation (small time impact);


BTW, swapping indices (i,j) -> (j,i) in the 2 loops didn't result in significant differences (theoretically, row-wise matrix access should be faster).

Code:
library(compiler)
FUN2 <- cmpfun(FUN)
for2 <- cmpfun(function(mtcars, FUN) {
      mat <- matrix(rep(NA, ncol(mtcars)^2), ncol(mtcars))
   for (i in 1:ncol(mtcars)) {
       for (j in 1:ncol(mtcars)) {
           mat[i, j] <- FUN(mtcars[[i]], mtcars[[j]])
       }
   }
   mat
})
Benchmarks:
 Unit: milliseconds
                min       lq   median       uq      max neval
 outer     7.791739 7.991474 8.245869 8.538163 16.24460   100
 for       8.143679 8.463249 8.588230 9.912008 16.30842   100
 for-mods  4.713837 4.875972 5.006202 5.246584 15.66491   100
In my opinion, it will be difficult to find a much faster approach (but I may be wrong). The for loop time bias is quite small (ca. 0.25 ms) comparing to the time needed to compute FUN multiple times.

                        这篇关于更快的i，j矩阵单元格填充的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

更快的i，j矩阵单元格填充 [英] Faster i, j matrix cell fill

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

更快的i，j矩阵单元格填充 [英] Faster i, j matrix cell fill

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭