使用 Hadoop/MapReduce 查找连接的组件 [英] Finding Connected Components using Hadoop/MapReduce

查看:26
本文介绍了使用 Hadoop/MapReduce 查找连接的组件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要为庞大的数据集找到连接的组件.(图是无向的)

I need to find connected components for a huge dataset. (Graph being Undirected)

一个明显的选择是 MapReduce.但我是 MapReduce 的新手,我没有时间自己学习并编写代码.

One obvious choice is MapReduce. But i'm a newbie to MapReduce and am quiet short of time to pick it up and to code it myself.

我只是想知道是否有任何现有的 API,因为它是社交网络分析中非常常见的问题?

I was just wondering if there is any existing API for the same since it is a very common problem in Social Network Analysis?

或者至少如果有人知道任何可靠的(经过试验和测试的)来源,我至少可以自己开始实施?

Or atleast if anyone is aware of any reliable(tried and tested) source using which atleast i can get started with the implementation myself?

谢谢

推荐答案

我为自己写了一篇博客:

I blogged about it for myself:

http://codingwiththomas.blogspot.de/2011/04/graph-exploration-with-hadoop-mapreduce.html

但是 MapReduce 不适合这些图形分析的东西.最好使用 BSP(批量同步并行),Apache Hama 在 Hadoop HDFS 之上提供了一个很好的图形 API.

But MapReduce isn't a good fit for these Graph analysis things. Better use BSP (bulk synchronous parallel) for that, Apache Hama provides a good graph API on top of Hadoop HDFS.

我在这里用 MapReduce 编写了一个连通分量算法:(Mindist search)

I've written a connected components algorithm with MapReduce here: (Mindist search)

https://github.com/thomasjungblut/tjungblut-graph/tree/master/src/de/jungblut/graph/mapreduce

还可以在此处找到 Apache Hama 的 BSP 版本:

Also a BSP version for Apache Hama can be found here:

https://github.com/thomasjungblut/tjungblut-graph/blob/master/src/de/jungblut/graph/bsp/MindistSearch.java

实现并不像 MapReduce 那样困难,而且速度至少快 10 倍.如果您有兴趣,请在 TRUNK 中查看最新版本并访问我们的邮件列表.

The implementation isn't as difficult as in MapReduce and it is at least 10 times faster. If you're interested, checkout the latest version in TRUNK and visit our mailing list.

http://hama.apache.org/

http://apache.org/hama/mail-lists.html

这篇关于使用 Hadoop/MapReduce 查找连接的组件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆