如何提高MongoDB的批量性能? [英] How can I improve MongoDB bulk performance?

查看:94
本文介绍了如何提高MongoDB的批量性能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有一些元数据和大量项目的对象.我曾经将其存储在mongo中,并通过$unwind对该数组进行查询. 但是,在极端情况下,阵列太大,以至于我遇到了16MB BSON的限制.

I have this object with some metadata and a big array of items. I used to store this in mongo, and querying it by $unwinding the array. However, in extreme cases, the array becomes so big that I run into 16MB BSON limitations.

因此,我需要将数组的每个元素存储为单独的文档.为此,我需要将元数据添加到所有这些元数据中,以便可以找到它们. 建议我使用批量操作为此.

So I need to store each element of the array as a separate document. For that I need to add the metadata to all of them, so I can find them back. It is suggested that I use bulk operations for this.

但是,性能似乎真的很慢.插入一个大文档几乎是即时的,这最多需要十秒钟.

However, performance seems to be really slow. Inserting one big document was near-instant, and this takes up to ten seconds.

var bulk        = col.initializeOrderedBulkOp();
var metaData    = {
    hash            : hash,
    date            : timestamp,
    name            : name
};

// measure time here

for (var i = 0, l = array.length; i < l; i++) { // 6000 items
    var item = array[i];

    bulk.insert({ // Apparently, this 6000 times takes 2.9 seconds
        data        : item,
        metaData    : metaData
    });

}

bulk.execute(bulkOpts, function(err, result) { // and this takes 6.5 seconds
    // measure time here
});

大容量插入6000个文档,总计38 MB的数据(在MongoDB中将其转换为49 MB的BSON),性能似乎令人无法接受. 将元数据附加到每个文档的开销不会那么糟糕,对吧?更新两个索引的开销不会那么糟糕,对吧?

Bulk inserting 6000 documents totalling 38 MB worth of data (which translates to 49 MB as BSON in MongoDB), performance seems unacceptably bad. The overhead of appending metadata to every document can't be that bad, right? The overhead of updating two indexes can't be that bad, right?

我错过了什么吗?有没有更好的方法来插入需要成组提取的文档组?

Am I missing something? Is there a better way of inserting groups of documents that need to be fetched as a group?

不仅仅是我的笔记本电脑.在服务器上相同.让我认为这不是配置错误,而是编程错误.

It's not just my laptop. Same on the server. Makes me think this is not a configuration error, rather a programming error.

将MongoDB 2.6.11与节点适配器node-mongodb-native 2.0.49

Using MongoDB 2.6.11 with node adapter node-mongodb-native 2.0.49

-更新-

将元数据添加到批量帐户中的每个元素的操作仅需 2.9秒.需要有一种更好的方法.

Just the act of adding the metadata to every element in the bulk accounts for 2.9 seconds. There needs to be a better way of doing this.

推荐答案

批量发送批量插入操作,因为这会导致较少的服务器流量,从而通过不在单个语句中发送所有内容来执行高效的电汇事务,而是分解成可管理的块以实现服务器承诺.通过这种方法,在回调中等待响应的时间也更少了.

Send the bulk insert operations in batches as this results in less traffic to the server and thus performs efficient wire transactions by not sending everything all in individual statements, but rather breaking up into manageable chunks for server commitment. There is also less time waiting for the response in the callback with this approach.

一种更好的方法是使用 异步 模块,因此即使循环输入列表也是非阻塞操作.选择批处理大小可以有所不同,但是每1000个条目选择批处理插入操作将可以安全地保持在16MB BSON硬限制以下,因为整个请求"等于一个BSON文档.

A much better approach with this would be using the async module so even looping the input list is a non-blocking operation. Choosing the batch size can vary, but selecting batch insert operations per 1000 entries would make it safe to stay under the 16MB BSON hard limit, as the whole "request" is equal to one BSON document.

以下内容演示了在迭代过程中使用 async 模块的过程数组并重复调用迭代器函数,而test返回true.停止或发生错误时调用回调.

The following demonstrates using the async module's whilst to iterate through the array and repeatedly call the iterator function, while test returns true. Calls callback when stopped, or when an error occurs.

var bulk = col.initializeOrderedBulkOp(),
    counter = 0,
    len = array.length,
    buildModel = function(index){   
        return {
            "data": array[index],
            "metaData": {
                "hash": hash,
                "date": timestamp,
                "name": name
            }
        }
    };

async.whilst(
    // Iterator condition
    function() { return counter < len },

    // Do this in the iterator
    function (callback) {
        counter++;
        var model = buildModel(counter);
        bulk.insert(model);

        if (counter % 1000 == 0) {
            bulk.execute(function(err, result) {
                bulk = col.initializeOrderedBulkOp();
                callback(err);
            });
        } else {
            callback();
        }
    },

    // When all is done
    function(err) {
        if (counter % 1000 != 0) {
            bulk.execute(function(err, result) {
                console.log("More inserts.");
            }); 
        }           
        console.log("All done now!");
    }
);

这篇关于如何提高MongoDB的批量性能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆