如何实现特征值计算与马preduce / Hadoop的? [英] how to implement eigenvalue calculation with MapReduce/Hadoop?

查看:242
本文介绍了如何实现特征值计算与马preduce / Hadoop的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是可能的,因为PageRank的是特征值的一种形式,这就是为什么马云preduce介绍。但似乎在实际执行中的问题,如每个从机必须保持矩阵副本?

It is possible because PageRank was a form of eigenvalue and that is why MapReduce introduced. But there seems problems in actual implementation, such as every slave computer have to maintain a copy of the matrix?

推荐答案

的PageRank通过反复查找网络的稳态离散流量条件解决的主要特征向量的问题。

PageRank solves the dominant eigenvector problem by iteratively finding the steady-state discrete flow condition of the network.

如果N×M的矩阵A从节点n介绍的链接权重(量流)为节点m,那么

If NxM matrix A describes the link weight (amount of flow) from node n to node m, then

p_{n+1} = A . p_{n}

在其中p已经收敛到稳定状态(P_N + 1 = P_N)的限制,这是一个特征向量问题的特征值1.

In the limit where p has converged to a steady state (p_n+1 = p_n), this is an eigenvector problem with eigenvalue 1.

中的PageRank算法不需要矩阵被保存在内存中,但是低效上密集的(非稀疏)矩阵。对于密集矩阵,麻preduce是错误的解决方案 - 你需要地方和节点间的广泛交流 - 你而应该看LAPACK和MPI和朋友

The PageRank algorithm doesn't require the matrix to be held in memory, but is inefficient on dense (non-sparse) matrices. For dense matrices, MapReduce is the wrong solution -- you need locality and broad exchange among nodes -- and you should instead look at LaPACK and MPI and friends.

您可以看到在悟空库工作的PageRank实现( Hadoop的流媒体红宝石),或在 Heretrix PageRank的子模块 。 (该heretrix code运行独立的Heretrix)

You can see a working pagerank implementation in the wukong library (hadoop streaming for ruby) or in the Heretrix pagerank submodule. (The heretrix code runs independently of Heretrix)

(声明:我是悟空的作者)

(disclaimer: I am an author of wukong.)

这篇关于如何实现特征值计算与马preduce / Hadoop的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆