如何实现特征值计算与马preduce / Hadoop的? [英] how to implement eigenvalue calculation with MapReduce/Hadoop?
问题描述
这是可能的,因为PageRank的是特征值的一种形式,这就是为什么马云preduce介绍。但似乎在实际执行中的问题,如每个从机必须保持矩阵副本?
It is possible because PageRank was a form of eigenvalue and that is why MapReduce introduced. But there seems problems in actual implementation, such as every slave computer have to maintain a copy of the matrix?
推荐答案
的PageRank通过反复查找网络的稳态离散流量条件解决的主要特征向量的问题。
PageRank solves the dominant eigenvector problem by iteratively finding the steady-state discrete flow condition of the network.
如果N×M的矩阵A从节点n介绍的链接权重(量流)为节点m,那么
If NxM matrix A describes the link weight (amount of flow) from node n to node m, then
p_{n+1} = A . p_{n}
在其中p已经收敛到稳定状态(P_N + 1 = P_N)的限制,这是一个特征向量问题的特征值1.
In the limit where p has converged to a steady state (p_n+1 = p_n), this is an eigenvector problem with eigenvalue 1.
中的PageRank算法不需要矩阵被保存在内存中,但是低效上密集的(非稀疏)矩阵。对于密集矩阵,麻preduce是错误的解决方案 - 你需要地方和节点间的广泛交流 - 你而应该看LAPACK和MPI和朋友
The PageRank algorithm doesn't require the matrix to be held in memory, but is inefficient on dense (non-sparse) matrices. For dense matrices, MapReduce is the wrong solution -- you need locality and broad exchange among nodes -- and you should instead look at LaPACK and MPI and friends.
您可以看到在悟空库工作的PageRank实现( Hadoop的流媒体红宝石),或在 Heretrix PageRank的子模块 。 (该heretrix code运行独立的Heretrix)
You can see a working pagerank implementation in the wukong library (hadoop streaming for ruby) or in the Heretrix pagerank submodule. (The heretrix code runs independently of Heretrix)
(声明:我是悟空的作者)
(disclaimer: I am an author of wukong.)
这篇关于如何实现特征值计算与马preduce / Hadoop的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!