MongoDB 聚合比较:group()、$group 和 MapReduce [英] MongoDB aggregation comparison: group(), $group and MapReduce

查看:30
本文介绍了MongoDB 聚合比较:group()、$group 和 MapReduce的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对何时使用 group()、与 $group 或 mapreduce 聚合有些困惑.我阅读了 http://www.mongodb.org/display/DOCS/Aggregation 上的文档a> 对于 group(),http://docs.mongodb.org/手动/参考/聚合/组/#_S_group 用于 $group.. 分片是 group() 不起作用的唯一情况吗?另外,我觉得 $group 比 group() 更强大,因为它可以与来自聚合框架的其他管道运算符结合使用.. $group 与 mapreduce 相比如何?我在某处读到它不会生成任何临时集合,而 mapreduce 会.是这样吗?
有人可以提供一个插图或引导我到一个链接,在其中一起解释这三个概念,采用相同的样本数据,以便我可以轻松地比较它们?


此外,如果您能在新的 2.2 版本发布后特别指出这些命令中的任何新内容,那就太好了.

I am somewhat confused about when to use group(), aggregate with $group or mapreduce. I read the documentation at http://www.mongodb.org/display/DOCS/Aggregation for group(), http://docs.mongodb.org/manual/reference/aggregation/group/#_S_group for $group.. Is sharding the only situation where group() won't work? Also, I get this feeling that $group is more powerful than group() because it can be used in conjunction with other pipeline operators from aggregation framework.. How does $group compare with mapreduce? I read somewhere that it doesn't generate any temporary collection whereas mapreduce does. Is that so?
Can someone present an illustration or guide me to a link where these three concepts are explained together, taking the same sample data, so I can compare them easily?


Also, it would be great if you can point out anything new specifically in these commands since the new 2.2 release came out..

推荐答案

由于名称相似,这有点令人困惑,但是 group() 命令 是与 $group 管道操作符 在聚合框架中.

It is somewhat confusing since the names are similar, but the group() command is a different feature and implementation from the $group pipeline operator in the Aggregation Framework.

group() 命令、聚合框架和 MapReduce 是 MongoDB 的聚合特性的统称.功能上有一些重叠,但我将尝试解释 MongoDB 2.2.0 中每个功能的差异和局限性.

The group() command, Aggregation Framework, and MapReduce are collectively aggregation features of MongoDB. There is some overlap in features, but I'll attempt to explain the differences and limitations of each as at MongoDB 2.2.0.

注意:下面提到的内联结果集是指在内存中处理并在函数调用结束时返回结果的查询.替代输出选项(目前仅适用于 MapReduce)可能包括将结果保存到新的或现有的集合.

Note: inline result sets mentioned below refer to queries that are processed in memory with results returned at the end of the function call. Alternative output options (currently only available with MapReduce) could include saving results to a new or existing collection.

  • 用于分组的简单语法和功能..类似于 SQL 中的 GROUP BY.

内联返回结果集(作为分组项目的数组).

Returns result set inline (as an array of grouped items).

使用 JavaScript 引擎实现;自定义 reduce() 函数可以用 JavaScript 编写.

Implemented using the JavaScript engine; custom reduce() functions can be written in JavaScript.

当前限制

  • 不会组合成超过 20,000 个键的结果集.

  • Will not group into a result set with more than 20,000 keys.

结果必须符合 BSON 文档的限制(目前为 16MB).

Results must fit within the limitations of a BSON document (currently 16MB).

获取读锁并且不允许任何其他线程在运行时执行 JavaScript.

Takes a read lock and does not allow any other threads to execute JavaScript while it is running.

不适用于分片集合.

另见:group() 命令示例.

See also: group() command examples.

可以从多个输出选项之一中进行选择(内联、新建集合、合并、替换、减少)

Can choose from one of several output options (inline, new collection, merge, replace, reduce)

MapReduce 函数是用 JavaScript 编写的.

MapReduce functions are written in JavaScript.

支持非分片和分片输入集合.

Supports non-sharded and sharded input collections.

可用于对大型集合进行增量聚合.

Can be used for incremental aggregation over large collections.

MongoDB 2.2 实现了对 sharded map reduce 的更好支持输出.

MongoDB 2.2 implements much better support for sharded map reduce output.

当前限制

  • 一次发射只能容纳 MongoDB 最大 BSON 文档大小 (16MB) 的一半.

  • A single emit can only hold half of MongoDB's maximum BSON document size (16MB).

有一个 JavaScript 锁,所以一个 mongod 服务器在一个时间点只能执行一个 JavaScript 函数.然而,MapReduce 的大多数步骤都很短,所以锁可以频繁地产生.

There is a JavaScript lock so a mongod server can only execute one JavaScript function at a point in time .. however, most steps of the MapReduce are very short so locks can be yielded frequently.

MapReduce 函数可能难以调试.您可以使用 print()printjson()mongod 日志中包含诊断输出.

MapReduce functions can be difficult to debug. You can use print() and printjson() to include diagnostic output in the mongod log.

MapReduce 对于试图转换关系查询聚合体验的程序员来说通常不直观.

MapReduce is generally not intuitive for programmers trying to translate relational query aggregation experience.

另见:映射/减少示例.

See also: Map/Reduce examples.

  • MongoDB 2.2.0 生产版(2012 年 8 月)中的新功能.

  • New feature in the MongoDB 2.2.0 production release (August, 2012).

专为提高性能和可用性而设计.

Designed with specific goals of improving performance and usability.

内联返回结果集.

支持非分片和分片输入集合.

Supports non-sharded and sharded input collections.

使用管道"方法,其中对象在通过一系列管道运算符(例如匹配、投影、排序和分组)时进行转换.

Uses a "pipeline" approach where objects are transformed as they pass through a series of pipeline operators such as matching, projecting, sorting, and grouping.

管道操作员不需要为每个输入文档生成一个输出文档:操作员还可以生成新文档或过滤掉文档.

Pipeline operators need not produce one output document for every input document: operators may also generate new documents or filter out documents.

使用投影,您可以添加计算域、创建新的虚拟子对象并将子域提取到顶级结果中.

Using projections you can add computed fields, create new virtual sub-objects, and extract sub-fields into the top-level of results.

管道操作符可以根据需要重复(例如,多个 $project$group 步骤.

Pipeline operators can be repeated as needed (for example, multiple $project or $group steps.

当前限制

  • 结果以内联方式返回,因此受限于服务器支持的最大文档大小 (16MB)

  • Results are returned inline, so are limited to the maximum document size supported by the server (16MB)

不支持与 MapReduce 一样多的输出选项

Doesn't support as many output options as MapReduce

仅限于聚合框架支持的运算符和表达式(即不能编写自定义函数)

Limited to operators and expressions supported by the Aggregation Framework (i.e. can't write custom functions)

用于聚合的最新服务器功能,因此在文档、功能集和使用方面有更多的成熟空间.

Newest server feature for aggregation, so has more room to mature in terms of documentation, feature set, and usage.

另见:聚合框架示例.

See also: Aggregation Framework examples.

有人可以提供一个插图或指导我到一个链接,其中将这三个概念一起解释,采用相同的样本数据,以便我可以轻松比较它们?

Can someone present an illustration or guide me to a link where these three concepts are explained together, taking the same sample data, so I can compare them easily?

您通常找不到比较所有三种方法都有用的示例,但这里是以前的 StackOverflow 问题,它们显示了变化:

You generally won't find examples where it would be useful to compare all three approaches, but here are previous StackOverflow questions which show variations:

这篇关于MongoDB 聚合比较:group()、$group 和 MapReduce的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆