Hadoop Map / Reduce与内置的Map / Reduce [英] Hadoop Map/Reduce vs built-In Map/Reduce

查看:144
本文介绍了Hadoop Map / Reduce与内置的Map / Reduce的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用Hadoop map / reduce与内置map / reduce Mongo进行map / reduce工作在MongoDB上的关键区别有哪些?



何时选择哪个映射/减少引擎?每个引擎对mongodb中存储的数据有什么优点和缺点?解决方案

我的答案是基于知识和经验Hadoop MR和Mongo DB MR的学习。
让我们看看有哪些主要差异,然后尝试为选择定义标准:
差异是:


  1. Hadoop的MR可以用Java编写,而MongoDB则用JavaScript编写。


  2. Hadoop的MR能够利用所有内核,而MongoDB则是单线程的。

  3. Hadoop MR不会与数据,而Mongo DB将配置在一起。

  4. Hadoop MR拥有数百万的引擎/小时数,并且可以处理大量输出大小,数据偏斜等情况。 br>

  5. Hadoop MR引擎上建立了更高层次的框架,如Pig,Hive,Cascading。

  6. Hadoop MR是主流和大量的社区支持是可用的。

从上面我可以推荐以下选择标准: >
如果您需要简单的group by和过滤,请选择Mongo DB MR,不要指望map和reduce之间有重复的混洗。换句话说 - 很简单。


如果您要执行复杂的,计算密集的MR作业(例如一些回归计算),请选择hadoop MR。在map和reduce之间有很多或不可预知的数据大小也表明Hadoop MR。

Java是一种更强大的语言,包含更多的库,尤其是统计数据库。应该考虑到这一点。


What are the key differences to do map/reduce work on MongoDB using Hadoop map/reduce vs built-in map/reduce of Mongo?

When do I pick which map/reduce engine? what are the pros and cons of each engine to work on data stored in mongodb?

解决方案

My answer is based on knowledge and experience of Hadoop MR and learning of Mongo DB MR. Lets see what are major differences and then try to define criteria for selection: Differences are:

  1. Hadoop's MR can be written in Java, while MongoDB's is in JavaScript.
  2. Hadoop's MR is capable of utilizing all cores, while MongoDB's is single threaded.
  3. Hadoop MR will not be collocated with the data, while Mongo DB's will be collocated.
  4. Hadoop MR has millions of engine/hours and can cope with many corner cases with massive size of output, data skews, etc
  5. There are higher level frameworks like Pig, Hive, Cascading built on top of the Hadoop MR engine.
  6. Hadoop MR is mainstream and a lot of community support is available.

From the above I can suggest the following criteria for selection:
Select Mongo DB MR if you need simple group by and filtering, do not expect heavy shuffling between map and reduce. In other words - something simple.

Select hadoop MR if you're going to do complicated, computationally intense MR jobs (for example some regressions calculations). Having a lot or unpredictable size of data between map and reduce also suggests Hadoop MR.

Java is a stronger language with more libraries, especially statistical. That should be taken into account.

这篇关于Hadoop Map / Reduce与内置的Map / Reduce的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆