增加mongodb聚合作业的内存限制 [英] Increasing memory limit for mongodb aggregate jobs

查看:384
本文介绍了增加mongodb聚合作业的内存限制的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

基于汇总管道文档, 任何单个聚合操作都会消耗系统RAM的10%以上,该操作将产生错误." - http://docs.mongodb.org/manual/core/aggregation-pipeline -limits/

Based on the aggregation pipeline docs, "any single aggregation operation consumes more than 10 percent of system RAM, the operation will produce an error." - http://docs.mongodb.org/manual/core/aggregation-pipeline-limits/

有什么办法可以增加这个限制?我还设置了allowDiskUse:true(因此该错误不再是问题),但想使用更多的RAM来提高性能.

Is there any way of increasing this limit? I have also set allowDiskUse: true (so the error is no longer an issue), but would like to use more RAM to improve performance.

背景: 我在mongodb上运行了大约1亿个条目的大型聚合作业.从根本上说,这是对$ group的大规模调用,以基于键合并条目.

Background: I am running a large aggregate job on mongodb on about 100 million entries. It is basically a massive call to $group to merge the entries based on a key.

我正在使用mongo v 2.6.0-rc2(2014年3月21日)的开发版本

I am using the dev release of mongo v 2.6.0-rc2 (3/21/2014)

推荐答案

没有,没有设置,如果您真的考虑过,则有充分的理由.因此,如果您首先考虑聚合正在做什么以及MongoDB总体上会做什么,那么这应该很清楚.

Well no there is no setting and if you really think about it there is good reason for this. So if you first consider what aggregate is doing and what MongoDB does in general it should become clear.

这是应该" 在任何明智的聚合管道的头"上的情况:

This is what "should" be at the "head" of any sensible aggregation pipeline:

db.collection.aggregate([
    { "$match:{ /* Something here */ } },

这些是原因:

  1. 尝试尝试减少 any 操作中正在使用的工作集是很有意义的.

  1. It makes good sense to try to reduce the working set that you are operating on in any operation.

这也是唯一的时间,您有机会使用索引来帮助搜索选择.这比集合扫描总是好.

This is also the only time you get the opportunity to use an index to aid in searching the selection. Which is always better than a collection scan.

即使有内置的优化器" 查找诸如限制选定"字段的投影"之类的东西,对工作集大小的最佳检查也是处理有效记录.后期比赛不是通过优化" 这样的.(请参见第 1 点)

Even though there is a built in "optimizer" that looks for such things as "projections" limiting the "selected" fields, the best scrutineer of working set size is to only work on the valid records. Later stage matches are not "optimized" in this way.(See point 1)

接下来要考虑的是MongoDB的一般行为.因此,服务器进程想要要做的是消耗" 尽可能多地占用可用计算机内存,以便保存工作集"数据(集合和/或索引),以便以最有效的方式对该数据进行工作" .

The next thing to consider is the general behavior of MongoDB. So that the server process wants to do, is "consume" as much of the available machine memory as it can in order to hold the "working set" data ( collections and/or index ) in order to "work" on that data in the most efficient means.

因此,确实是数据库引擎的最大利益" 中的一种,它以这种方式花费"最多的 个内存分配.这样,您的聚合" 作业和所有 other 并发进程都可以访问内存空间中的工作数据".

So it really is in the "best interests" of the database engine to "spend" most of it's memory allocation in this way. As in that way, both your "aggregate" job and all of the other concurrent processes have access to the "working data" in the memory space.

因此,MongoDB从其他并发操作 进行窃取" 不是最佳" 您正在运行的聚合操作.

So therefore it is "not optimal" for MongoDB to "steal" this memory allocation away from the other concurrent operations just to service your running aggregation operation.

根据硬件要求进行编程" 术语中,您知道,将来的发行版将允许聚合管道实现磁盘使用",以便进行更大的处理.您始终可以实施SSD或其他 fast 存储技术.当然,"10%" 的RAM取决于系统中安装的RAM的数量.因此,您随时可以增加.

In the "programming to hardware requirements" terms, well you are aware that future releases allow the aggregation pipeline to implement "disk use" in order to allow larger processing. You can always implement SSD's or other fast storage technologies. And of course "10%" of RAM is subjective to the amount of RAM that is installed in a system. So you can always increase that.

综述是,MongoDB实际上是并发数据存储" 工作,并且做得很好. 不是不是特定聚集工作流氓",因此不应将其视为此类.

The roundup of this is, MongoDB has an actual job of being a "concurrent datastore" and does that well. What it is not is a specific "aggregation job-runner" and should not be treated as such.

因此,分手" 您的工作量,或增加您的硬件规格,或者只是将大型的任务运行"活动切换为可以执行的操作专注于运行中的工作,例如 Hadoop风格"mapReduce",而将MongoDB保留为服务数据的工作.

So either "break-up" your workloads, or increase your hardware spec, or simply switch the large "task running" activity to something that does focus on the running job such as a Hadoop-style "mapReduce", and leave MongoDB to it's job of serving the data.

当然,也可以将设计更改为预聚合" 所需的数据写时" .

Or of course, change your design to simply "pre-aggregate" the required data somewhere "on write".

俗话说,课程的马匹" ,或者使用您的工具设计的.

As the saying goes, "Horses for courses", or use your tools for what they were designed for.

这篇关于增加mongodb聚合作业的内存限制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆