如何对预凝胶进行分区以最大化处理速度? [英] how to partition graph for pregel to maximize processing speed?

查看:83
本文介绍了如何对预凝胶进行分区以最大化处理速度?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个众包应用程序.收集用户的数据,然后对其进行处理,然后进行更新,以供所有人查看.数据收集几乎是实时的.随着用户(数据节点)的增加,处理速度也在增加.我需要按比例缩放.

I have a crowdsourcing application. data from users is collected and then processed and then updated for everyone to see. The data collection is almost real time. The processing speed is increasing as the users (data nodes) are increasing. I need to scale this.

考虑基于图的模型的缩放,mapreduce似乎很有名.是否有基准测试文件将其与其他技术进行比较?普雷格尔印象深刻.请向我指出有关预凝胶中分区"的任何线索,即如何智能地划分图形,以最大程度地减少彼此滞后的过程.

Looking at scaling for graph based models, mapreduce seems to be famous. Is there a benchmarking paper comparing it to other techniques? Pregel is impressive. Please point me to any leads about 'partitioning' in pregel i.e, how a graph can be partitioned intelligently so as to minimize processes lagging behind each other.

推荐答案

为最小化执行时间而智能地"划分图的问题是一个有趣的问题,但是这并不简单,它取决于您的数据和算法. 在实践中,您可能还会发现没有必要,并且随机分区就足够了.

The problem of partitioning a graph 'intelligently' in order to minimize execution time is an interesting one, however it's not simple and it depends on your data and your algorithm. You might find also that, in practice, it's not necessary and a random partitioning is sufficiently good.

例如,如果您有兴趣探索类似Pregel的方法,可以查看 Apache Giraph 并尝试使用不同的分区技术.

For example, if you are interested in exploring Pregel-like approaches, you can have a look at Apache Giraph and experiment with different partitioning techniques.

这篇关于如何对预凝胶进行分区以最大化处理速度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆