更快的i,j矩阵单元格填充 [英] Faster i, j matrix cell fill

查看:113
本文介绍了更快的i,j矩阵单元格填充的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想要采用data.frame / matrix列,并在一个数据帧的每个单元格( [i,j] )之间应用一个函数,其中i和j是数据框架的列的序列。基本上我想填充单个单元格的矩阵,就像 cor 函数与data.frame一样。

I want to take columns of a data.frame/matrix and apply a function to between each cell ([i, j]) of the dataframe where i and j are the sequences along the columns of the data.frame. Basically I want to fill a matrix of individual cells in the same way that the cor function works with a data.frame.

这是一个相关的问题:创建来自函数的矩阵和两个数字数据帧但是,我在随机测试中使用它,并重复该操作多次(使许多矩阵)。我正在寻找最快的方法来做这个操作。我使用并行处理加速了一些事情,但我仍然不满意这个速度。不能假定矩阵输出是对称的,也就是说, cor 产生一个对称矩阵(我的例子将反映出来)。

This is a related question: Create a matrix from a function and two numeric data frames However, I use this in randomization tests and repeat the operation many times (make many matrices). I'm looking for the fastest way to do this operation. I have sped things up a bit using parallel processing but I'm still not happy with this speed. It can not be assumed that the matrix output is symmetrical either, that is in the way cor produces a symmetrical matrix (my example will reflect this).

我今天在data.table网页上看到( http://datatable.r-forge.r-project.org/ )以下内容:

I saw on the data.table web page today (http://datatable.r-forge.r-project.org/) the following:


DF [i,j] <-value



<这让我想到,可能 data.table dplyr 或其他方法可能会加快一点。我的大脑已经固定在填充细胞,但也许有一个更好的方式涉及整形,应用功能和重塑成矩阵或沿着这些线的东西。我可以使用 external 或循环的在base R中实现此目的。

This got me thinking that perhaps data.table or dplyr or other means may speed things up a bit. My brain has been fixed on filling cells but maybe there's a better way involving reshaping, applying the function and reshaping to a matrix or something along those lines. I can achieve this in base R using outer or a for loop as follows.

## Arbitrary function
FUN <- function(x, y) round(sqrt(sum(x)) - sum(y), digits=1)

## outer approach
outer(
  names(mtcars), 
  names(mtcars), 
  Vectorize(function(i,j) FUN(mtcars[,i],mtcars[,j]))
)

## for approach
mat <- matrix(rep(NA, ncol(mtcars)^2), ncol(mtcars))
for (i in 1:ncol(mtcars)) {
    for (j in 1:ncol(mtcars)) {
        mat[i, j] <- FUN(mtcars[, i], mtcars[, j])
    }
}
mat

以下是微型基准定时与获得轻微的边缘。

Here are the microbenchmark timings with for getting a slight edge.

Unit: milliseconds
    expr      min       lq   median       uq      max neval
 OUTER() 4.450410 4.691124 4.774394 4.877724 55.77333  1000
   FOR() 4.309527 4.521785 4.588728 4.694156  7.04275  1000

这是最快的方法在R(添加包欢迎)?

What is the fastest approach to this in R (add on packages welcomed)?

推荐答案

仍然坚持 base R解决方案,以为基础的中的1.7倍加速:

Still sticking to base R solution, I got a 1.6-1.7x speedup in the for-based approach by:


  • 替换 [,i] for [[i]] (有意义的时间影响 - 也许 FUN 只是在这里接收C指针而不是新分配的向量);

  • FUN 的字节码编译(小时间影响) ;

  • 包装 for 代码到一个函数+字节代码编译(小时间影响);

  • substituting [,i] for [[i]] (significant time impact - perhaps FUN just receives C pointers here instead of freshly allocated vectors);
  • byte-code compiling of FUN (small time impact);
  • wrapping for code to a function + byte-code compilation (small time impact);

BTW,2循环中的交换索引(i,j) - >(j,i)并没有产生显着差异(理论上,逐行矩阵访问应该更快)。

BTW, swapping indices (i,j) -> (j,i) in the 2 loops didn't result in significant differences (theoretically, row-wise matrix access should be faster).

代码:

library(compiler)
FUN2 <- cmpfun(FUN)
for2 <- cmpfun(function(mtcars, FUN) {
      mat <- matrix(rep(NA, ncol(mtcars)^2), ncol(mtcars))
   for (i in 1:ncol(mtcars)) {
       for (j in 1:ncol(mtcars)) {
           mat[i, j] <- FUN(mtcars[[i]], mtcars[[j]])
       }
   }
   mat
})

基准:

 Unit: milliseconds
                min       lq   median       uq      max neval
 outer     7.791739 7.991474 8.245869 8.538163 16.24460   100
 for       8.143679 8.463249 8.588230 9.912008 16.30842   100
 for-mods  4.713837 4.875972 5.006202 5.246584 15.66491   100

在我看来,这将是困难的找到一个更快的方法(但我可能是错的)。相对于计算 FUN 多次所需的时间,循环时间偏差的相当小(约0.25 ms) 。

In my opinion, it will be difficult to find a much faster approach (but I may be wrong). The for loop time bias is quite small (ca. 0.25 ms) comparing to the time needed to compute FUN multiple times.

这篇关于更快的i,j矩阵单元格填充的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆