如何使用 MapReduce/Hadoop 实现特征值计算? [英] how to implement eigenvalue calculation with MapReduce/Hadoop?
问题描述
这是可能的,因为 PageRank 是特征值的一种形式,这就是 MapReduce 引入的原因.但在实际实现中似乎存在问题,比如每台从机都要维护一份矩阵的副本?
It is possible because PageRank was a form of eigenvalue and that is why MapReduce introduced. But there seems problems in actual implementation, such as every slave computer have to maintain a copy of the matrix?
推荐答案
PageRank通过迭代寻找网络的稳态离散流条件来解决主导特征向量问题.
PageRank solves the dominant eigenvector problem by iteratively finding the steady-state discrete flow condition of the network.
如果NxM矩阵A描述了从节点n到节点m的链接权重(流量),那么
If NxM matrix A describes the link weight (amount of flow) from node n to node m, then
p_{n+1} = A . p_{n}
在p已经收敛到稳态的极限(p_n+1 = p_n),这是一个特征值为1的特征向量问题.
In the limit where p has converged to a steady state (p_n+1 = p_n), this is an eigenvector problem with eigenvalue 1.
PageRank 算法不需要将矩阵保存在内存中,但在密集(非稀疏)矩阵上效率低下.对于稠密矩阵,MapReduce 是错误的解决方案——您需要节点之间的局部性和广泛的交换——您应该转而查看 LaPACK 和 MPI 及其朋友.
The PageRank algorithm doesn't require the matrix to be held in memory, but is inefficient on dense (non-sparse) matrices. For dense matrices, MapReduce is the wrong solution -- you need locality and broad exchange among nodes -- and you should instead look at LaPACK and MPI and friends.
您可以在 wukong 库中看到有效的 pagerank 实现(ruby 的 hadoop 流)或 Heretrix pagerank 子模块.(heretrix 代码独立于 Heretrix 运行)
You can see a working pagerank implementation in the wukong library (hadoop streaming for ruby) or in the Heretrix pagerank submodule. (The heretrix code runs independently of Heretrix)
(免责声明:我是悟空的作者.)
(disclaimer: I am an author of wukong.)
这篇关于如何使用 MapReduce/Hadoop 实现特征值计算?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!