使用Hadoop / MapReduce查找连接组件 [英] Finding Connected Components using Hadoop/MapReduce

查看:131
本文介绍了使用Hadoop / MapReduce查找连接组件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要为一个庞大的数据集找到连接的组件。 (图为无向)

一个明显的选择是MapReduce。但是我是MapReduce的新手,很短的时间就可以拿起它并自己编写代码。



我只是想知道是否有任何现有的API for因为这是社交网络分析中的一个非常常见的问题吗?



或者至少有人知道任何可靠的(经过测试的)源代码,至少我可以使用它开始执行自己?

谢谢

解决方案

它适用于我自己:



http://codingwiththomas.blogspot.de/2011/04/graph-exploration-with-hadoop-mapreduce.html



但MapReduce并不适合这些Graph分析。为此,为了更好地使用BSP(批量同步并行),Apache Hama在Hadoop HDFS之上提供了一个良好的图形API。

我在这里用MapReduce编写了一个连接组件算法:(Mindist search)

https://github.com/thomasjungblut/tjungblut-graph/tree/master/src/de/jungblut/graph/mapreduce



此外,还可以在这里找到Apache Hama的BSP版本:

/blob/master/src/de/jungblut/graph/bsp/MindistSearch.javarel =noreferrer> https://github.com/thomasjungblut/tjungblut-graph/blob/master/src/de/jungblut/graph /bsp/MindistSearch.java



实施并不像在MapReduce中那么困难,而且速度至少快了10倍。
如果您有兴趣,请查看TRUNK中的最新版本,并访问我们的邮件列表。

http://hama.apache.org/



http://apache.org/hama/mail-lists.html


I need to find connected components for a huge dataset. (Graph being Undirected)

One obvious choice is MapReduce. But i'm a newbie to MapReduce and am quiet short of time to pick it up and to code it myself.

I was just wondering if there is any existing API for the same since it is a very common problem in Social Network Analysis?

Or atleast if anyone is aware of any reliable(tried and tested) source using which atleast i can get started with the implementation myself?

Thanks

解决方案

I blogged about it for myself:

http://codingwiththomas.blogspot.de/2011/04/graph-exploration-with-hadoop-mapreduce.html

But MapReduce isn't a good fit for these Graph analysis things. Better use BSP (bulk synchronous parallel) for that, Apache Hama provides a good graph API on top of Hadoop HDFS.

I've written a connected components algorithm with MapReduce here: (Mindist search)

https://github.com/thomasjungblut/tjungblut-graph/tree/master/src/de/jungblut/graph/mapreduce

Also a BSP version for Apache Hama can be found here:

https://github.com/thomasjungblut/tjungblut-graph/blob/master/src/de/jungblut/graph/bsp/MindistSearch.java

The implementation isn't as difficult as in MapReduce and it is at least 10 times faster. If you're interested, checkout the latest version in TRUNK and visit our mailing list.

http://hama.apache.org/

http://apache.org/hama/mail-lists.html

这篇关于使用Hadoop / MapReduce查找连接组件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆