如何最有效地更新MongoDB中的大量文档? [英] How to update a large number of documents in MongoDB most effeciently?

查看:42
本文介绍了如何最有效地更新MongoDB中的大量文档?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想最有效地更新大量(> 100,000)文档.

I want to update large numbers (> 100,000) of documents most efficiently.

我的第一个天真的方法是在 JS 级别上做,编写脚本首先获取 _ids,然后遍历 _ids 并通过 _id 调用更新(完整文档或 $set 补丁).

My first naive approach was doing it on the JS level, writing scripts that fetch _ids first, then loop through _ids and invoke updates by _id (full docs or $set patches).

我遇到了内存问题,还将数据分片成最大的块.500文档(打开和关闭连接)似乎效果不佳.

I ran into memory issues, also sharding the data into chunks of max. 500 documents (with opening and closing the connection) doesn't seem to work well.

那么我如何在 MongoDB 级别解决这个问题?
最佳做法?

So how can i solve this on the MongoDB level?
Best practice?

我有 3 个常见用例,通常是维护工作流程:

I have 3 common use cases, typically maintenance work flows:

1.更改属性值的类型,而不更改值.

// before
{
  timestamp : '1446987395'
}

// after
{
  timestamp : 1446987395
}

2.根据现有资产的价值添加新资产.

// before
{
  firstname : 'John',
  lastname  : 'Doe'
}

// after
{
  firstname : 'John',
  lastname  : 'Doe',
  name      : 'John Doe'
}

3.只需从文档中添加删除属性.

// before
{
  street    : 'Whatever Ave',
  street_no : '1025'
}

// after
{
  street    : 'Whatever Ave',
  no        : '1025'
}

感谢您的帮助.

推荐答案

如果您的 MongoDB 服务器是 2.6 或更新版本,最好利用写入命令 Bulk API 允许执行批量update 操作,它们只是服务器顶部的抽象,可以轻松构建批量操作.这些批量操作主要有两种形式:

If your MongoDB server is 2.6 or newer, it would be better to take advantage of using a write commands Bulk API that allow for the execution of bulk update operations which are simply abstractions on top of the server to make it easy to build bulk operations. These bulk operations come mainly in two flavours:

  • 订购批量操作.这些操作按顺序执行所有操作,并在第一个写入错误时出错.
  • 无序批量操作.这些操作并行执行所有操作并汇总所有错误.无序批量操作不能保证执行顺序.
  • Ordered bulk operations. These operations execute all the operation in order and error out on the first write error.
  • Unordered bulk operations. These operations execute all the operations in parallel and aggregates up all the errors. Unordered bulk operations do not guarantee order of execution.

请注意,对于比 2.6 更旧的服务器,API 将向下转换操作.但是,不可能 100% 下变频,因此可能存在一些无法正确报告正确数字的边缘情况.

Note, for older servers than 2.6 the API will downconvert the operations. However it's not possible to downconvert 100% so there might be some edge cases where it cannot correctly report the right numbers.

对于您的三个常见用例,您可以像这样实现 Bulk API:

For your three common use cases, you could implement the Bulk API like this:

情况 1. 更改属性值的类型,而不更改值:

Case 1. Change type of value of property, without changing the value:

var MongoClient = require('mongodb').MongoClient;

MongoClient.connect("mongodb://localhost:27017/test", function(err, db) {
    // Handle error
    if(err) throw err;

    // Get the collection and bulk api artefacts
    var col = db.collection('users'),           
        bulk = col.initializeOrderedBulkOp(), // Initialize the Ordered Batch
        counter = 0;        

    // Case 1. Change type of value of property, without changing the value.        
    col.find({"timestamp": {"$exists": true, "$type": 2} }).each(function (err, doc) {

        var newTimestamp = parseInt(doc.timestamp);
        bulk.find({ "_id": doc._id }).updateOne({
            "$set": { "timestamp": newTimestamp }
        });

        counter++;

        if (counter % 1000 == 0 ) {
            bulk.execute(function(err, result) {  
                // re-initialise batch operation           
                bulk = col.initializeOrderedBulkOp();
            });
        }
    });

    if (counter % 1000 != 0 ){
        bulk.execute(function(err, result) {
            // do something with result
            db.close();
        }); 
    } 
});

案例 2. 根据现有财产的价值添加新财产:

Case 2. Add new property based on value of existing property:

MongoClient.connect("mongodb://localhost:27017/test", function(err, db) {
    // Handle error
    if(err) throw err;

    // Get the collection and bulk api artefacts
    var col = db.collection('users'),           
        bulk = col.initializeOrderedBulkOp(), // Initialize the Ordered Batch
        counter = 0;        

    // Case 2. Add new property based on value of existing property.        
    col.find({"name": {"$exists": false } }).each(function (err, doc) {

        var fullName = doc.firstname + " " doc.lastname;
        bulk.find({ "_id": doc._id }).updateOne({
            "$set": { "name": fullName }
        });

        counter++;

        if (counter % 1000 == 0 ) {
            bulk.execute(function(err, result) {  
                // re-initialise batch operation           
                bulk = col.initializeOrderedBulkOp();
            });
        }
    });

    if (counter % 1000 != 0 ){
        bulk.execute(function(err, result) {
            // do something with result
            db.close();
        }); 
    } 
});

案例 3. 只需添加从文档中删除的属性.

Case 3. Simply adding removing properties from documents.

MongoClient.connect("mongodb://localhost:27017/test", function(err, db) {
    // Handle error
    if(err) throw err;

    // Get the collection and bulk api artefacts
    var col = db.collection('users'),           
        bulk = col.initializeOrderedBulkOp(), // Initialize the Ordered Batch
        counter = 0;        

    // Case 3. Simply adding removing properties from documents.    
    col.find({"street_no": {"$exists": true } }).each(function (err, doc) {

        bulk.find({ "_id": doc._id }).updateOne({
            "$set": { "no": doc.street_no },
            "$unset": { "street_no": "" }
        });

        counter++;

        if (counter % 1000 == 0 ) {
            bulk.execute(function(err, result) {  
                // re-initialise batch operation           
                bulk = col.initializeOrderedBulkOp();
            });
        }
    });

    if (counter % 1000 != 0 ){
        bulk.execute(function(err, result) {
            // do something with result
            db.close();
        }); 
    } 
});

这篇关于如何最有效地更新MongoDB中的大量文档?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆