MongoDb 2.2、2.4和2.6中的Map-Reduce性能 [英] Map-Reduce performance in MongoDb 2.2, 2.4, and 2.6

查看:92
本文介绍了MongoDb 2.2、2.4和2.6中的Map-Reduce性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我发现了以下讨论: MongoDB:可怕的MapReduce性能.基本上说它试图避免Mongo的MR查询,因为它是单线程的,根本不应该是实时的. 2年过去了,我想知道从那时起发生了什么变化.现在我们有了MongoDb 2.2.我听说MR现在是多线程的.请就实时请求的MR使用情况分享您的想法,例如为Web应用程序频繁的HTTP请求获取数据.它可以有效地使用索引吗?

I've found this discussion: MongoDB: Terrible MapReduce Performance. Basically it says try to avoid Mongo's MR queries as it single-threaded and not supposed to be for real-time at all. 2 years has passed, and I wonder what has been changed since the time. Now we have MongoDb 2.2. I heard MRs are now multi-threaded. Please share your ideas over MR usage for real-time requests like fetching data for web application frequent http requests. Is it able to effectively use indexes?

推荐答案

这是MongoDB中Map/Reduce功能的当前状态

Here is the current state of functionality for Map/Reduce in MongoDB

1)Map/Reduce的大多数性能限制仍然保留在MongoDB 2.2版中. Map/Reduce引擎仍然要求将每条记录从BSON转换为JSON,使用嵌入式JavaScript引擎执行实际的计算(速度很慢),并且仍然存在单个全局JavaScript锁,该锁仅允许单个JavaScript线程一次运行.

1) Most of the performance limitations for Map/Reduce still remain in MongoDB version 2.2. The Map/Reduce engine still requires that every record get converted from BSON to JSON, the actual calculations are performed using the embedded JavaScript engine (which is slow), and there still is a single global JavaScript lock, which only allows a single JavaScript thread to run at a single time.

对于分片群集,Map/Reduce进行了一些增量改进.最值得注意的是,现在,最终的Reduce操作已分布在多个分片中,并且输出也被并行分片.

There have been some incremental improvements to Map/Reduce for sharded clusters. Most notably, the final Reduce operation is now distributed across multiple shards, and the output is also sharded in parallel.

我不建议将Map/Reduce用于MongoDB 2.2版中的实时聚合

I would not recommend Map/Reduce for real-time aggregation in MongoDB version 2.2

2)从MongoDB 2.2开始,现在有了新的Aggregation Framework.这是聚合操作的新实现,用C ++编写,并紧密集成到MongoDB框架中.

2) Starting with MongoDB 2.2, there is now a new Aggregation Framework. This is a new implementation of aggregation operations, written in C++, and tightly integrated into the MongoDB framework.

大多数Map/Reduce作业都可以重写以使用Aggregation Framework.它们通常运行得更快(与Map/Reduce在版本2.2中相比,速度提高了20倍),它们充分利用了现有的查询引擎,并且您可以并行运行多个Aggregation命令.

Most Map/Reduce jobs can be rewritten to use the Aggregation Framework. They usually run faster (20x speed improvement vs. Map/Reduce is common in version 2.2), they make full use of the existing query engine, and you can run multiple Aggregation commands in parallel.

如果您有实时聚合要求,那么首先要开始的是Aggregation Framework.有关聚合框架的更多信息,请查看以下链接:

If you have real-time aggregation requirements, the first place to start is with the Aggregation Framework. For more information about the aggregation framework, take a look at these links:

  • http://www.10gen.com/presentations/mongonyc-2012/new-aggregation-framework
  • http://docs.mongodb.org/manual/reference/aggregation/

3)MongoDB 2.4版中的Map/Reduce进行了重大改进. SpiderMonkey JavaScript引擎已被V8 JavaScript引擎取代,并且不再有全局JavaScript锁,这意味着可以同时运行多个Map/Reduce线程.

3) There have been significant improvements in Map/Reduce in MongoDB version 2.4. The SpiderMonkey JavaScript engine has been replaced by the V8 JavaScript engine, and there is no longer a global JavaScript lock, which means that multiple Map/Reduce threads can run concurrently.

Map/Reduce引擎仍然比聚合框架慢很多,主要有两个原因:

The Map/Reduce engine is still considerably slower than the aggregation framework, for two main reasons:

  • JavaScript引擎被解释,而Aggregation Framework 运行编译的C ++代码

  • The JavaScript engine is interpreted, while the Aggregation Framework runs compiled C++ code

JavaScript引擎仍然要求将每个要检查的文档从BSON转换为JSON;如果要将输出保存在集合中,则必须将结果集从JSON转换回BSON

The JavaScript engine still requires that every document being examined get converted from BSON to JSON; if you're saving the output in a collection, the result set must then be converted from JSON back to BSON

Map/Reduce在2.4和2.6之间没有明显变化.

There are no significant changes in Map/Reduce between 2.4 and 2.6.

在MongoDB 2.4或2.6版中,我仍然不建议使用Map/Reduce进行实时聚合.

I still do not recommend using the Map/Reduce for real-time aggregation in MongoDB version 2.4 or 2.6.

4)如果您确实需要Map/Reduce,则还可以查看Hadoop Adaptor.这里有更多信息:

4) If you really need Map/Reduce, you can also look at the Hadoop Adaptor. There's more information here:

  • http://www.10gen.com/presentations/webinar/mongodb-hadoop-taming-elephant-room
  • http://api.mongodb.org/hadoop/MongoDB%2BHadoop+Connector.html
  • http://www.mongodb.org/display/DOCS/Hadoop+Quick+Start

这篇关于MongoDb 2.2、2.4和2.6中的Map-Reduce性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆