分布式互相关矩阵计算 [英] Distributed cross correlation matrix computation

查看：237 发布时间：2020/6/3 20:12:32 algorithm apache-spark distributed-computing distributed cross-correlation

本文介绍了分布式互相关矩阵计算的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

如何才能以分布式方式计算大型（> 10TB）数据集的皮尔逊互相关矩阵？任何有效的分布式算法建议都会受到赞赏。

How can I calculate pearson cross correlation matrix of large (>10TB) data set, possibly in distributed manner ? Any efficient distributed algorithm suggestion will be appreciated.

更新：
我阅读了apache spark mlib相关性的实现

update: I read the implementation of apache spark mlib correlation

Pearson Computaation:
/home/d066537/codespark/spark/mllib/src/main/scala/org/apache/spark/mllib/stat/correlation/Correlation.scala
Covariance Computation:
/home/d066537/codespark/spark/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala

但是对我来说，似乎所有计算都在一个节点上进行，并且并不是真正意义上的分布。

but for me it looks like all the computation is happening at one node and it is not distributed in real sense.

请在此处说明一下。我还尝试在3节点Spark集群上执行它，以下是屏幕截图：

Please put some light in here. I also tried executing it on a 3 node spark cluster and below are the screenshot:

从第二张图片中可以看到，数据在一个节点上拉，然后进行了计算。我在这里吗？

As you can see from 2nd image that data is pulled up at one node and then computation is being done.Am i right in here ?

分布式互相关矩阵计算 [英] Distributed cross correlation matrix computation

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

分布式互相关矩阵计算 [英] Distributed cross correlation matrix computation

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭