R中的平行距离矩阵 [英] Parallel distance Matrix in R

查看:117
本文介绍了R中的平行距离矩阵的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当前,我正在使用内置函数dist来计算R中的距离矩阵.

currently I'm using the build in function dist to calculate my distance matrix in R.

dist(featureVector,method="manhattan")

这是当前应用程序的瓶颈,因此该想法是使该任务并行化(从概念上讲这应该是可能的)

This is currently the bottlneck of the application and therefore the idea was to parallize this task(conceptually this should be possible)

搜索google和此论坛失败.

Searching google and this forum did not succeed.

有人有主意吗?

推荐答案

以下是您可以选择的一条路线的结构.它不仅比仅使用dist()函数要快,而且要花很多倍的时间.它确实可以并行处理,但是即使将计算时间减少为零,启动函数并将变量导出到群集的时间也可能比仅使用dist()

Here's the structure for one route you could go. It is not faster than just using the dist() function, instead taking many times longer. It does process in parallel, but even if the computation time were reduced to zero, the time to start up the function and export the variables to the cluster would probably be longer than just using dist()

library(parallel)

vec.array <- matrix(rnorm(2000 * 100), nrow = 2000, ncol = 100)

TaxiDistFun <- function(one.vec, whole.matrix) {
    diff.matrix <- t(t(whole.matrix) - one.vec)
    this.row <- apply(diff.matrix, 1, function(x) sum(abs(x)))
    return(this.row)
}

cl <- makeCluster(detectCores())
clusterExport(cl, list("vec.array", "TaxiDistFun"))

system.time(dist.array <- parRapply(cl, vec.array,
                        function(x) TaxiDistFun(x, vec.array)))

stopCluster(cl)

dim(dist.array) <- c(2000, 2000)

这篇关于R中的平行距离矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆