更快的 i, j 矩阵单元填充 [英] Faster i, j matrix cell fill

查看:13
本文介绍了更快的 i, j 矩阵单元填充的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想获取 data.frame/matrix 的列并在数据帧的每个单元格 ([i, j]) 之间应用一个函数,其中 i 和 j 是沿列的序列的data.frame.基本上,我想以与 cor 函数处理 data.frame 相同的方式填充单个单元格的矩阵.

I want to take columns of a data.frame/matrix and apply a function to between each cell ([i, j]) of the dataframe where i and j are the sequences along the columns of the data.frame. Basically I want to fill a matrix of individual cells in the same way that the cor function works with a data.frame.

这是一个相关问题:创建一个函数的矩阵和两个数字数据框 但是,我在随机化测试中使用它并多次重复该操作(制作许多矩阵).我正在寻找执行此操作的最快方法.我使用并行处理加快了速度,但我仍然对这种速度不满意.也不能假设矩阵输出是对称的,即 cor 产生对称矩阵的方式(我的示例将反映这一点).

This is a related question: Create a matrix from a function and two numeric data frames However, I use this in randomization tests and repeat the operation many times (make many matrices). I'm looking for the fastest way to do this operation. I have sped things up a bit using parallel processing but I'm still not happy with this speed. It can not be assumed that the matrix output is symmetrical either, that is in the way cor produces a symmetrical matrix (my example will reflect this).

我今天在data.table网页上看到了(http://datatable.r-forge.r-project.org/) 以下内容:

I saw on the data.table web page today (http://datatable.r-forge.r-project.org/) the following:

DF[i,j]<-value

这让我想到也许 data.tabledplyr 或其他方式可能会加快速度.我的大脑一直专注于填充细胞,但也许有更好的方法涉及重塑、应用函数和重塑矩阵或类似的东西.我可以使用 outerfor 循环在基本 R 中实现这一点,如下所示.

This got me thinking that perhaps data.table or dplyr or other means may speed things up a bit. My brain has been fixed on filling cells but maybe there's a better way involving reshaping, applying the function and reshaping to a matrix or something along those lines. I can achieve this in base R using outer or a for loop as follows.

## Arbitrary function
FUN <- function(x, y) round(sqrt(sum(x)) - sum(y), digits=1)

## outer approach
outer(
  names(mtcars), 
  names(mtcars), 
  Vectorize(function(i,j) FUN(mtcars[,i],mtcars[,j]))
)

## for approach
mat <- matrix(rep(NA, ncol(mtcars)^2), ncol(mtcars))
for (i in 1:ncol(mtcars)) {
    for (j in 1:ncol(mtcars)) {
        mat[i, j] <- FUN(mtcars[, i], mtcars[, j])
    }
}
mat

这里是 microbenchmark 计时与 获得微弱优势.

Here are the microbenchmark timings with for getting a slight edge.

Unit: milliseconds
    expr      min       lq   median       uq      max neval
 OUTER() 4.450410 4.691124 4.774394 4.877724 55.77333  1000
   FOR() 4.309527 4.521785 4.588728 4.694156  7.04275  1000

R 中最快的方法是什么(欢迎添加软件包)?

What is the fastest approach to this in R (add on packages welcomed)?

推荐答案

仍然坚持 base R 解决方案,我在 for 中获得了 1.6-1.7 倍的加速 -基于的方法:

Still sticking to base R solution, I got a 1.6-1.7x speedup in the for-based approach by:

  • [,i] 代替 [[i]] (显着的时间影响 - 也许 FUN 只是在这里接收 C 指针而不是新鲜分配的向量);
  • FUN
  • 字节码编译(时间影响小);
  • for 代码包装成函数 + 字节码编译(时间影响小);
  • substituting [,i] for [[i]] (significant time impact - perhaps FUN just receives C pointers here instead of freshly allocated vectors);
  • byte-code compiling of FUN (small time impact);
  • wrapping for code to a function + byte-code compilation (small time impact);

顺便说一句,在 2 个循环中交换索引 (i,j) -> (j,i) 不会导致显着差异(理论上,逐行矩阵访问应该更快).

BTW, swapping indices (i,j) -> (j,i) in the 2 loops didn't result in significant differences (theoretically, row-wise matrix access should be faster).

代码:

library(compiler)
FUN2 <- cmpfun(FUN)
for2 <- cmpfun(function(mtcars, FUN) {
      mat <- matrix(rep(NA, ncol(mtcars)^2), ncol(mtcars))
   for (i in 1:ncol(mtcars)) {
       for (j in 1:ncol(mtcars)) {
           mat[i, j] <- FUN(mtcars[[i]], mtcars[[j]])
       }
   }
   mat
})

基准测试:

 Unit: milliseconds
                min       lq   median       uq      max neval
 outer     7.791739 7.991474 8.245869 8.538163 16.24460   100
 for       8.143679 8.463249 8.588230 9.912008 16.30842   100
 for-mods  4.713837 4.875972 5.006202 5.246584 15.66491   100

在我看来,很难找到更快的方法(但我可能错了).与多次计算 FUN 所需的时间相比,for 循环时间偏差非常小(大约 0.25 毫秒).

In my opinion, it will be difficult to find a much faster approach (but I may be wrong). The for loop time bias is quite small (ca. 0.25 ms) comparing to the time needed to compute FUN multiple times.

这篇关于更快的 i, j 矩阵单元填充的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆