如何使用 MapReduce/Hadoop 实现特征值计算? [英] how to implement eigenvalue calculation with MapReduce/Hadoop?

查看:25
本文介绍了如何使用 MapReduce/Hadoop 实现特征值计算?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是可能的,因为 PageRank 是特征值的一种形式,这就是 MapReduce 引入的原因.但在实际实现中似乎存在问题,比如每台从机都要维护一份矩阵的副本?

It is possible because PageRank was a form of eigenvalue and that is why MapReduce introduced. But there seems problems in actual implementation, such as every slave computer have to maintain a copy of the matrix?

推荐答案

PageRank通过迭代寻找网络的稳态离散流条件来解决主导特征向量问题.

PageRank solves the dominant eigenvector problem by iteratively finding the steady-state discrete flow condition of the network.

如果NxM矩阵A描述了从节点n到节点m的链接权重(流量),那么

If NxM matrix A describes the link weight (amount of flow) from node n to node m, then

p_{n+1} = A . p_{n} 

在p已经收敛到稳态的极限(p_n+1 = p_n),这是一个特征值为1的特征向量问题.

In the limit where p has converged to a steady state (p_n+1 = p_n), this is an eigenvector problem with eigenvalue 1.

PageRank 算法不需要将矩阵保存在内存中,但在密集(非稀疏)矩阵上效率低下.对于稠密矩阵,MapReduce 是错误的解决方案——您需要节点之间的局部性和广泛的交换——您应该转而查看 LaPACK 和 MPI 及其朋友.

The PageRank algorithm doesn't require the matrix to be held in memory, but is inefficient on dense (non-sparse) matrices. For dense matrices, MapReduce is the wrong solution -- you need locality and broad exchange among nodes -- and you should instead look at LaPACK and MPI and friends.

您可以在 wukong 库中看到有效的 pagerank 实现(ruby 的 hadoop 流)或 Heretrix pagerank 子模块.(heretrix 代码独立于 Heretrix 运行)

You can see a working pagerank implementation in the wukong library (hadoop streaming for ruby) or in the Heretrix pagerank submodule. (The heretrix code runs independently of Heretrix)

(免责声明:我是悟空的作者.)

(disclaimer: I am an author of wukong.)

这篇关于如何使用 MapReduce/Hadoop 实现特征值计算?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆