将功能应用于列的所有成对组合的最快方法 [英] Fastest way to apply function to all pairwise combinations of columns

查看:73
本文介绍了将功能应用于列的所有成对组合的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定具有任意数量的行和列的数据帧或矩阵,将函数应用于所有成对的列组合的最快方法是什么?

Given a data frame or matrix with arbitrary number of rows and columns, what is the fastest way to apply a function to all pairwise combinations of columns?

例如,如果我有数据表:

For example, if I have a data table:

N <- 3
K <- 3
data <- data.table(id=seq(N))
for(k in seq(K)) {
    data[[k]] <- runif(N)
}

要计算所有成对的列之间的简单差异,我可以在列上循环(或 lapply ):

And I want to compute the simple difference between all pairs of columns, I could loop (or lapply) over columns:

differences = data.table(foo=seq(N))
for(var1 in names(data)) {
    for(var2 in names(data)) {
        if (var1==var2) next
        if (which(names(data)==var1)>which(names(data)==var2)) next
        combo <- paste0(var1, var2)
        differences[[combo]] <- data[[var1]]-data[[var2]]
    }
}

但是,随着K变大,这变得异常缓慢。

But as K gets larger, this becomes absurdly slow.

我考虑过的一种解决方案是使用 combn 制作两个新数据表并减去它们:

One solution I've considered is to make two new data tables using combn and subtract them:

a <- data[,combn(colnames(data),2)[1,],with=F]
b <- data[,combn(colnames(data),2)[2,],with=F]
differences <- a-b

但是随着N和K变大,这将占用大量内存(尽管比循环快)。

But as N and K get larger, this becomes very memory intensive (though faster than looping).

在我看来,矩阵的外部乘积可能是最好的选择,但我无法将其拼凑在一起。如果我想应用任意函数(例如,RMSE)而不是仅仅求差,这会特别困难。

It seems to me that the outer product of the matrix with itself is probably the best way to go, but I can't piece it together. This is especially hard if I want to apply an arbitrary function (RMSE for example), instead of just the difference.

最快的方法是什么?

推荐答案

如果必须首先将数据包含在矩阵中,则可以执行以下操作:

If it is necessary to have the data in a matrix first, you can do the following:

library(data.table)

data <- matrix(runif(300*500), nrow = 300, ncol = 500)

data.DT <- setkey(data.table(c(data), colId = rep(1:500, each = 300), rowId = rep(1:300, times = 500)), colId)

diff.DT <- data.DT[
  , {
    ccl <- unique(colId)
    vv <- V1
    data.DT[colId > ccl, .(col2 = colId, V1 - vv)]
  }
  , keyby = .(col1 = colId)
]

这篇关于将功能应用于列的所有成对组合的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆