什么是使用Mongoose更新MongoDB中许多记录的正确方法 [英] What is the right approach to update many records in MongoDB using Mongoose

查看:91
本文介绍了什么是使用Mongoose更新MongoDB中许多记录的正确方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Mongoose从MongoDB中提取一些记录,将它们导入到另一个系统中,然后我想将所有这些文档的状态(文档属性)设置为processed.

我可以找到以下解决方案:通过ID集更新多个文档.猫鼬

我想知道这是否是正确的方法,即建立一个由所有文档ID组成的标准,然后执行更新.还请考虑一个事实,那就是它将有许多文档.

(更新查询的限制是什么?在任何地方都找不到它.官方文档: http://mongoosejs.com/docs/2.7.x/docs/updating-documents.html )

解决方案

建立由所有文档ID组成的标准,然后执行更新的方法势必会引起潜在的问题.当您遍历每个文档发送更新操作的文档列表时,在Mongoose中,您冒着炸毁服务器的风险,尤其是在处理大型数据集时,因为您不等待异步调用完成就可以继续下一个操作迭代.从本质上讲,您将建立一个未解决操作的堆栈",直到引起问题-Stackoverflow.

例如,假设您要在状态字段上更新要匹配的文档的文档ID数组:

 const processedIds = [
  "57a0a96bd1c6ef24376477cd",
  "57a052242acf5a06d4996537",
  "57a052242acf5a06d4996538"
];
 

您可以在其中使用 updateMany() 方法

 Model.updateMany(
  { _id: { $in: processedIds } }, 
  { $set: { status: "processed" } }, 
  callback
);
 

或者对于非常小的数据集,可以使用 forEach() 方法以对其进行迭代并更新您的集合:

 processedIds.forEach(function(id)){
  Model.update({ _id: id}, { $set: { status: "processed" } }, callback);
});
 

上面对于小型数据集是可以的.但是,当您要面对成千上万的文档要更新时,这将成为一个问题,因为您将在循环中反复对异步代码进行服务器调用.

要克服此问题,请使用async的 eachLimit 并遍历数组,对每个项目执行MongoDB更新操作,而同时执行绝不超过x个并行更新.


最好的方法是为此使用批量API,这在批量处理更新中非常高效.在性能上与在许多文档中的每个文档上调用更新操作的性能差异在于,批量API不会在每次迭代时都将更新请求发送到服务器,而是每1000个请求(分批)发送一次请求.

对于支持MongoDB Server 3.2.x的Mongoose版本>=4.3.0,您可以使用 const bulkUpdateCallback = function(err, r){ console.log(r.matchedCount); console.log(r.modifiedCount); } // Initialize the bulk operations array const bulkUpdateOps = [], counter = 0; processedIds.forEach(function (id) { bulkUpdateOps.push({ updateOne: { filter: { _id: id }, update: { $set: { status: "processed" } } } }); counter++; if (counter % 500 == 0) { // Get the underlying collection via the Node.js driver collection object Model.collection.bulkWrite(bulkUpdateOps, { ordered: true, w: 1 }, bulkUpdateCallback); bulkUpdateOps = []; // re-initialize } }) // Flush any remaining bulk ops if (counter % 500 != 0) { Model.collection.bulkWrite(bulkOps, { ordered: true, w: 1 }, bulkUpdateCallback); }


对于支持MongoDB Server >=2.6.x的Mongoose版本~3.8.8~3.8.224.x,可以按以下方式使用Bulk API

var bulk = Model.collection.initializeOrderedBulkOp(),
    counter = 0;

processedIds.forEach(function(id) {
    bulk.find({ "_id": id }).updateOne({ 
        "$set": { "status": "processed" }
    });

    counter++;
    if (counter % 500 == 0) {
        bulk.execute(function(err, r) {
           // do something with the result
           bulk = Model.collection.initializeOrderedBulkOp();
           counter = 0;
        });
    }
});

// Catch any docs in the queue under or over the 500's
if (counter > 0) {
    bulk.execute(function(err,result) {
       // do something with the result here
    });
}

I am pulling some records from MongoDB using Mongoose, importing them into another system and then I would like to set status (document attribute) for all these documents to processed.

I could find this solution: Update multiple documents by id set. Mongoose

I was wondering if that is the right approach, to build up a criterion consisting of all document ids and then perform the update. Please also take into account a fact that it's going to be many documents.

(What is the limit of the update query? Couldn't find it anywhere. Official documentation: http://mongoosejs.com/docs/2.7.x/docs/updating-documents.html)

解决方案

The approach of building up a criterion consisting of all document ids and then performing the update is bound to cause potential issues. When you iterate a list of documents sending an update operation with each doc, in Mongoose you run the risk of blowing up your server especially when dealing with a large dataset because you are not waiting for an asynchronous call to complete before moving on to the next iteration. You will be essentially building a "stack" of unresolved operations until this causes a problem - Stackoverflow.

Take for example, supposing you had an array of document ids that you wanted to update the matching document on the status field:

const processedIds = [
  "57a0a96bd1c6ef24376477cd",
  "57a052242acf5a06d4996537",
  "57a052242acf5a06d4996538"
];

where you can use the updateMany() method

Model.updateMany(
  { _id: { $in: processedIds } }, 
  { $set: { status: "processed" } }, 
  callback
);

or alternatively for really small datasets you could use the forEach() method on the array to iterate it and update your collection:

processedIds.forEach(function(id)){
  Model.update({ _id: id}, { $set: { status: "processed" } }, callback);
});

The above is okay for small datasets. However, this becomes an issue when you are faced with thousands or millions of documents to update as you will be making repeated server calls of asynchronous code within the loop.

To overcome this use something like async's eachLimit and iterate over the array performing a MongoDB update operation for each item while never performing more than x parallel updates the same time.


The best approach would be to use the bulk API for this which is extremely efficient in processing updates in bulk. The difference in performance vs calling the update operation on each and every one of the many documents is that instead of sending the update requests to the server with each iteration, the bulk API sends the requests once in every 1000 requests (batched).

For Mongoose versions >=4.3.0 which support MongoDB Server 3.2.x, you can use bulkWrite() for updates. The following example shows how you can go about this:

const bulkUpdateCallback = function(err, r){
  console.log(r.matchedCount);
  console.log(r.modifiedCount);
}

// Initialize the bulk operations array
const bulkUpdateOps = [], counter = 0;

processedIds.forEach(function (id) {
  bulkUpdateOps.push({
    updateOne: {
      filter: { _id: id },
      update: { $set: { status: "processed" } }
    }
  });
  counter++;

  if (counter % 500 == 0) {
    // Get the underlying collection via the Node.js driver collection object
    Model.collection.bulkWrite(bulkUpdateOps, { ordered: true, w: 1 }, bulkUpdateCallback);
    bulkUpdateOps = []; // re-initialize
  }
})

// Flush any remaining bulk ops
if (counter % 500 != 0) {
  Model.collection.bulkWrite(bulkOps, { ordered: true, w: 1 }, bulkUpdateCallback);
}


For Mongoose versions ~3.8.8, ~3.8.22, 4.x which support MongoDB Server >=2.6.x, you could use the Bulk API as follows

var bulk = Model.collection.initializeOrderedBulkOp(),
    counter = 0;

processedIds.forEach(function(id) {
    bulk.find({ "_id": id }).updateOne({ 
        "$set": { "status": "processed" }
    });

    counter++;
    if (counter % 500 == 0) {
        bulk.execute(function(err, r) {
           // do something with the result
           bulk = Model.collection.initializeOrderedBulkOp();
           counter = 0;
        });
    }
});

// Catch any docs in the queue under or over the 500's
if (counter > 0) {
    bulk.execute(function(err,result) {
       // do something with the result here
    });
}

这篇关于什么是使用Mongoose更新MongoDB中许多记录的正确方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆