insertMany 处理重复错误 [英] insertMany Handle Duplicate Errors

查看:20
本文介绍了insertMany 处理重复错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将(对象数组)批量插入到我的文档中,但我想防止重复记录,找不到使用 insertMany 的方法.

I want to bulk insert (array of object) into my document but I want to prevent duplicate records, couldn't find a way to do it with insertMany.

const Song = require('../models/song');
Song.insertMany([{id:1, name:"something"},{id:2, name:"something else"])
    .then((result) => {
      res.json({
        result
      })
    })

以上代码有效,但如果记录相同,它仍会被插入.

Above code worked but if the record is the same it will still get inserted.

推荐答案

事实上,MongoDB在默认"下不会创建涉及唯一键"的重复数据,其中_id(被猫鼬别名为 id,但被 insertMany() 忽略,所以你需要小心),但是有一个更大的故事,你确实需要注意.

Well in actual fact, MongoDB by "default" will not create duplicate data where there is a "unique key" involved, of which _id ( aliased by mongoose as id, but ignored by insertMany() so you need to be careful ), but there is a much larger story to this that you really need to be aware of.

这里的基本问题是insertMany() 的猫鼬"实现以及底层驱动程序目前都有些无聊",说得客气一点.驱动程序在批量"操作中传递错误响应的方式存在一些不一致,这实际上是由猫鼬"并没有真正在正确的位置"查找实际错误信息而加剧的.

The basic problem here is that both the "mongoose" implementation of insertMany() as well as the underlying driver are currently are bit "borked" to put it mildly. That being there is a bit of an inconsistency in how the driver passes the error response in "Bulk" operations and this is actually compounded by "mongoose" not really "looking in the right place" for the actual error information.

您缺少的快速"部分是将 {ordered: false } 添加到Bulk"操作中,其中 .insertMany() 只是包装了一个调用到.设置此项可确保批量"请求实际完整"提交,并且在发生错误时不会停止执行.

The "quick" part you are missing is the addition of { ordered: false } to the "Bulk" operation of which .insertMany() simply wraps a call to. Setting this ensures that the "batch" of requests is actually submitted "completely" and does not stop execution when an error occurs.

但是由于mongoose"不能很好地处理这个问题(驱动程序也不能一致地"),我们实际上需要在响应"中寻找可能的错误",而不是底层回调的错误"结果.

But since "mongoose" does not handle this very well ( nor does the driver "consistently" ) we actually need to look for possible "errors" in the "response" rather than the "error" result of the underlying callback.

作为演示:

const mongoose = require('mongoose'),
      Schema = mongoose.Schema;

mongoose.Promise = global.Promise;
mongoose.set('debug',true);

const uri = 'mongodb://localhost/test',
      options = { useMongoClient: true };

const songSchema = new Schema({
  _id: Number,
  name: String
});

const Song = mongoose.model('Song', songSchema);

function log(data) {
  console.log(JSON.stringify(data, undefined, 2))
}

let docs = [
  { _id: 1, name: "something" },
  { _id: 2, name: "something else" },
  { _id: 2, name: "something else entirely" },
  { _id: 3, name: "another thing" }
];

mongoose.connect(uri,options)
  .then( () => Song.remove() )
  .then( () =>
    new Promise((resolve,reject) =>
      Song.collection.insertMany(docs,{ ordered: false },function(err,result) {
        if (result.hasWriteErrors()) {
          // Log something just for the sake of it
          console.log('Has Write Errors:');
          log(result.getWriteErrors());

          // Check to see if something else other than a duplicate key, and throw
          if (result.getWriteErrors().some( error => error.code != 11000 ))
            reject(err);
        }
        resolve(result);    // Otherwise resolve
      })
    )
  )
  .then( results => { log(results); return true; } )
  .then( () => Song.find() )
  .then( songs => { log(songs); mongoose.disconnect() })
  .catch( err => { console.error(err); mongoose.disconnect(); } );

或者也许更好一些,因为当前的 LTS node.js 有 async/await:

Or perhaps a bit nicer since current LTS node.js has async/await:

const mongoose = require('mongoose'),
      Schema = mongoose.Schema;

mongoose.Promise = global.Promise;
mongoose.set('debug',true);

const uri = 'mongodb://localhost/test',
      options = { useMongoClient: true };

const songSchema = new Schema({
  _id: Number,
  name: String
});

const Song = mongoose.model('Song', songSchema);

function log(data) {
  console.log(JSON.stringify(data, undefined, 2))
}

let docs = [
  { _id: 1, name: "something" },
  { _id: 2, name: "something else" },
  { _id: 2, name: "something else entirely" },
  { _id: 3, name: "another thing" }
];

(async function() {

  try {
    const conn = await mongoose.connect(uri,options);

    await Song.remove();

    let results = await new Promise((resolve,reject) => {
      Song.collection.insertMany(docs,{ ordered: false },function(err,result) {
        if (result.hasWriteErrors()) {
          // Log something just for the sake of it
          console.log('Has Write Errors:');
          log(result.getWriteErrors());

          // Check to see if something else other than a duplicate key, then throw
          if (result.getWriteErrors().some( error => error.code != 11000 ))
            reject(err);
        }
        resolve(result);    // Otherwise resolve

      });
    });

    log(results);

    let songs = await Song.find();
    log(songs);

  } catch(e) {
    console.error(e);
  } finally {
    mongoose.disconnect();
  }


})()

无论如何,您会得到相同的结果,表明写入仍在继续,并且我们恭敬地忽略"与重复键"相关的错误或以其他方式称为错误代码 11000.安全处理"是我们预期此类错误并丢弃它们,同时寻找我们可能只想注意的其他错误"的存在.我们还看到其余的代码继续并通过执行后续的 .find() 调用列出了实际插入的所有文档:

At any rate, you get the same result showing that writes are both continued and that we respectfully "ignore" errors that are related to a "duplicate key" or otherwise known as error code 11000. The "safe handling" is that we expect such errors and discard them whilst looking for the presence of "other errors" that we might just want to pay attention to. We also see the rest of the code continues and lists all documents actually inserted by executing a subsequent .find() call:

Mongoose: songs.remove({}, {})
Mongoose: songs.insertMany([ { _id: 1, name: 'something' }, { _id: 2, name: 'something else' }, { _id: 2, name: 'something else entirely' }, { _id: 3, name: 'another thing' } ], { ordered: false })
Has Write Errors:
[
  {
    "code": 11000,
    "index": 2,
    "errmsg": "E11000 duplicate key error collection: test.songs index: _id_ dup key: { : 2 }",
    "op": {
      "_id": 2,
      "name": "something else entirely"
    }
  }
]
{
  "ok": 1,
  "writeErrors": [
    {
      "code": 11000,
      "index": 2,
      "errmsg": "E11000 duplicate key error collection: test.songs index: _id_ dup key: { : 2 }",
      "op": {
        "_id": 2,
        "name": "something else entirely"
      }
    }
  ],
  "writeConcernErrors": [],
  "insertedIds": [
    {
      "index": 0,
      "_id": 1
    },
    {
      "index": 1,
      "_id": 2
    },
    {
      "index": 2,
      "_id": 2
    },
    {
      "index": 3,
      "_id": 3
    }
  ],
  "nInserted": 3,
  "nUpserted": 0,
  "nMatched": 0,
  "nModified": 0,
  "nRemoved": 0,
  "upserted": [],
  "lastOp": {
    "ts": "6485492726828630028",
    "t": 23
  }
}
Mongoose: songs.find({}, { fields: {} })
[
  {
    "_id": 1,
    "name": "something"
  },
  {
    "_id": 2,
    "name": "something else"
  },
  {
    "_id": 3,
    "name": "another thing"
  }
]

那么为什么要进行这个过程呢?原因是底层调用实际上返回了 errresult,如回调实现中所示,但返回的内容不一致.这样做的主要原因是让你真正看到结果",它不仅有操作成功的结果,还有错误信息.

So why this process? The reason being that the underlying call actually returns both the err and result as shown in the callback implementation but there is an inconsistency in what is returned. The main reason to do this is so you actually see the "result", which not only has the result of the successful operation, but also the error message.

与错误信息一起的是 nInserted: 3 指示实际写入的批"中有多少.您几乎可以忽略此处的 insertedIds,因为此特定测试涉及实际提供 _id 值.如果不同属性具有导致错误的唯一"约束,则此处唯一的值将是来自实际成功写入的值.有点误导,但很容易自己测试和查看.

Along with the error information is the nInserted: 3 indicating how many out of the "batch" actually were written. You can pretty much ignore the insertedIds here since this particular test involved actually supplying _id values. In the event where a different property had the "unique" constraint that caused the error, then the only values here would be those from actual successful writes. A bit misleading, but easy to test and see for yourself.

如前所述,问题在于不一致",这可以用另一个例子来证明(async/await 仅用于列表的简洁):

As stated, the catch is the "incosistency" which can be demonstrated with another example ( async/await only for brevity of listing):

const mongoose = require('mongoose'),
      Schema = mongoose.Schema;

mongoose.Promise = global.Promise;
mongoose.set('debug',true);

const uri = 'mongodb://localhost/test',
      options = { useMongoClient: true };

const songSchema = new Schema({
  _id: Number,
  name: String
});

const Song = mongoose.model('Song', songSchema);

function log(data) {
  console.log(JSON.stringify(data, undefined, 2))
}

let docs = [
  { _id: 1, name: "something" },
  { _id: 2, name: "something else" },
  { _id: 2, name: "something else entirely" },
  { _id: 3, name: "another thing" },
  { _id: 4, name: "different thing" },
  //{ _id: 4, name: "different thing again" }
];

(async function() {

  try {
    const conn = await mongoose.connect(uri,options);

    await Song.remove();

    try {
      let results = await Song.insertMany(docs,{ ordered: false });
      console.log('what? no result!');
      log(results);   // not going to get here
    } catch(e) {
      // Log something for the sake of it
      console.log('Has write Errors:');

      // Check to see if something else other than a duplicate key, then throw
      // Branching because MongoError is not consistent
      if (e.hasOwnProperty('writeErrors')) {
        log(e.writeErrors);
        if(e.writeErrors.some( error => error.code !== 11000 ))
          throw e;
      } else if (e.code !== 11000) {
        throw e;
      } else {
        log(e);
      }

    }

    let songs = await Song.find();
    log(songs);

  } catch(e) {
    console.error(e);
  } finally {
    mongoose.disconnect();
  }


})()

完全相同,但请注意此处错误记录的方式:

All much the same thing, but pay attention to how the error logs here:

Has write Errors:
{
  "code": 11000,
  "index": 2,
  "errmsg": "E11000 duplicate key error collection: test.songs index: _id_ dup key: { : 2 }",
  "op": {
    "__v": 0,
    "_id": 2,
    "name": "something else entirely"
  }
}

请注意,没有成功"信息,即使我们通过执行后续的 .find() 并获得输出获得了相同的列表延续.这是因为该实现仅对拒绝中的抛出错误"起作用,而不会通过实际的 result 部分.因此,即使我们要求 ordered: false,我们也无法获得有关已完成内容的信息,除非我们包装回调并自己实现逻辑,如初始清单所示.

Note that there is no "success" information, even though we get the same continuation of the listing by doing the subsequent .find() and getting the output. This is because the implementation only acts on the "thrown error" in rejection and never passes through the actual result part. So even though we asked for ordered: false, we don't get the information about what was completed unless we wrap the callback and implement the logic ourselves, as is shown in the initial listings.

另一个重要的不一致"发生在多个错误"时.所以取消注释 _id: 4 的附加值给我们:

The other important "inconsistency" happens when there is "more than one error". So uncommenting the additional value for _id: 4 gives us:

Has write Errors:
[
  {
    "code": 11000,
    "index": 2,
    "errmsg": "E11000 duplicate key error collection: test.songs index: _id_ dup key: { : 2 }",
    "op": {
      "__v": 0,
      "_id": 2,
      "name": "something else entirely"
    }
  },
  {
    "code": 11000,
    "index": 5,
    "errmsg": "E11000 duplicate key error collection: test.songs index: _id_ dup key: { : 4 }",
    "op": {
      "__v": 0,
      "_id": 4,
      "name": "different thing again"
    }
  }
]

在这里您可以看到存在 e.writeErrors 时分支"的代码,当出现 one 错误时不存在.相比之下,早期的 response 对象同时具有 hasWriteErrors()getWriteErrors() 方法,无论是否存在任何错误.所以这是更一致的接口以及为什么你应该使用它而不是单独检查 err 响应的原因.

Here you can see the code "branched" on the presence of e.writeErrors, which does not exist when there is one error. By contrast the earlier response object has both the hasWriteErrors() and getWriteErrors() methods, regardless of any error being present at all. So that is the more consistent interface and the reason why you should use it instead of inspecting the err response alone.

此行为实际上已在即将发布的驱动程序 3.x 版本中修复,该版本旨在与 MongoDB 3.6 服务器版本一致.行为变化在于 err 响应更类似于标准的 result,但当然归类为 BulkWriteError 响应而不是 MongoError 现在是这样.

This behavior is actually fixed in the upcoming 3.x release of the driver which is meant to coincide with the MongoDB 3.6 server release. The behavior changes in that the err response is more akin to the standard result, but of course classed as a BulkWriteError response instead of MongoError which it presently is.

直到发布(当然,直到该依赖项和更改传播到猫鼬"实现),然后推荐的行动方案是注意有用的信息在result 而不是err.事实上,您的代码可能应该在 result 中查找 hasErrors() 然后回退到检查 err 以及,以满足要在驱动程序中实现的更改.

Until that is released ( and of course until that dependency and changes are propagated to the "mongoose" implementation ), then the recommended course of action is to be aware that the useful information is in the result and not the err. In fact your code probably should look for hasErrors() in the result and then fallback to check err as well, in order to cater for the change to be implemented in the driver.

作者注:大部分内容和相关阅读实际上已经在 函数 insertMany()无序:获取错误和结果的正确方法?MongoDB Node.js 本机驱动程序静默吞下 bulkWrite 异常.但是在这里重复和详细说明,直到它最终让人们意识到这是您在当前驱动程序实现中处理异常的方式.当您查看正确的位置并编写相应的代码来处理它时,它确实有效.

Authors Note: Much of this content and related reading is actually already answered here on Function insertMany() unordered: proper way to get both the errors and the result? and MongoDB Node.js native driver silently swallows bulkWrite exception. But repeating and elaborating here until it finally sinks in to people that this is the way you handle exceptions in the current driver implementation. And it does actually work, when you look in the correct place and write your code to handle it accordingly.

这篇关于insertMany 处理重复错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆