insertMany处理重复错误 [英] insertMany Handle Duplicate Errors

查看:72
本文介绍了insertMany处理重复错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想批量插入(对象数组)到我的文档中,但是我想防止重复记录,找不到使用insertMany做到这一点的方法.

I want to bulk insert (array of object) into my document but I want to prevent duplicate records, couldn't find a way to do it with insertMany.

const Song = require('../models/song');
Song.insertMany([{id:1, name:"something"},{id:2, name:"something else"])
    .then((result) => {
      res.json({
        result
      })
    })

上面的代码可以工作,但是如果记录相同,它将仍然被插入.

Above code worked but if the record is the same it will still get inserted.

推荐答案

实际上,MongoDB默认为默认"时,不会在涉及唯一键"的情况下创建重复数据,其中_id(为别名猫鼬为id,但被insertMany()忽略,因此您需要小心),但是还有一个更大的故事,您确实需要了解.

Well in actual fact, MongoDB by "default" will not create duplicate data where there is a "unique key" involved, of which _id ( aliased by mongoose as id, but ignored by insertMany() so you need to be careful ), but there is a much larger story to this that you really need to be aware of.

这里的基本问题是,insertMany()的猫鼬"实现以及底层驱动程序当前都有些烦躁",以示温和.那样的话,驱动程序在批量"操作中如何传递错误响应会有一些不一致的地方,这实际上是由猫鼬"而不是真正在正确的位置"查找实际错误信息而造成的.

The basic problem here is that both the "mongoose" implementation of insertMany() as well as the underlying driver are currently are bit "borked" to put it mildly. That being there is a bit of an inconsistency in how the driver passes the error response in "Bulk" operations and this is actually compounded by "mongoose" not really "looking in the right place" for the actual error information.

您缺少的快速"部分是将{ ordered: false }添加到批量"操作中,而.insertMany()只是将调用包装到该操作中.设置此选项可确保批量"请求实际上是完全"提交的,并且不会在发生错误时停止执行.

The "quick" part you are missing is the addition of { ordered: false } to the "Bulk" operation of which .insertMany() simply wraps a call to. Setting this ensures that the "batch" of requests is actually submitted "completely" and does not stop execution when an error occurs.

但是由于猫鼬"不能很好地处理(驱动程序也不能一致地"),所以我们实际上需要在响应"中寻找可能的错误",而不是底层回调的错误"结果.

But since "mongoose" does not handle this very well ( nor does the driver "consistently" ) we actually need to look for possible "errors" in the "response" rather than the "error" result of the underlying callback.

作为演示:

const mongoose = require('mongoose'),
      Schema = mongoose.Schema;

mongoose.Promise = global.Promise;
mongoose.set('debug',true);

const uri = 'mongodb://localhost/test',
      options = { useMongoClient: true };

const songSchema = new Schema({
  _id: Number,
  name: String
});

const Song = mongoose.model('Song', songSchema);

function log(data) {
  console.log(JSON.stringify(data, undefined, 2))
}

let docs = [
  { _id: 1, name: "something" },
  { _id: 2, name: "something else" },
  { _id: 2, name: "something else entirely" },
  { _id: 3, name: "another thing" }
];

mongoose.connect(uri,options)
  .then( () => Song.remove() )
  .then( () =>
    new Promise((resolve,reject) =>
      Song.collection.insertMany(docs,{ ordered: false },function(err,result) {
        if (result.hasWriteErrors()) {
          // Log something just for the sake of it
          console.log('Has Write Errors:');
          log(result.getWriteErrors());

          // Check to see if something else other than a duplicate key, and throw
          if (result.getWriteErrors().some( error => error.code != 11000 ))
            reject(err);
        }
        resolve(result);    // Otherwise resolve
      })
    )
  )
  .then( results => { log(results); return true; } )
  .then( () => Song.find() )
  .then( songs => { log(songs); mongoose.disconnect() })
  .catch( err => { console.error(err); mongoose.disconnect(); } );

或者更好一点,因为当前的LTS node.js具有async/await:

Or perhaps a bit nicer since current LTS node.js has async/await:

const mongoose = require('mongoose'),
      Schema = mongoose.Schema;

mongoose.Promise = global.Promise;
mongoose.set('debug',true);

const uri = 'mongodb://localhost/test',
      options = { useMongoClient: true };

const songSchema = new Schema({
  _id: Number,
  name: String
});

const Song = mongoose.model('Song', songSchema);

function log(data) {
  console.log(JSON.stringify(data, undefined, 2))
}

let docs = [
  { _id: 1, name: "something" },
  { _id: 2, name: "something else" },
  { _id: 2, name: "something else entirely" },
  { _id: 3, name: "another thing" }
];

(async function() {

  try {
    const conn = await mongoose.connect(uri,options);

    await Song.remove();

    let results = await new Promise((resolve,reject) => {
      Song.collection.insertMany(docs,{ ordered: false },function(err,result) {
        if (result.hasWriteErrors()) {
          // Log something just for the sake of it
          console.log('Has Write Errors:');
          log(result.getWriteErrors());

          // Check to see if something else other than a duplicate key, then throw
          if (result.getWriteErrors().some( error => error.code != 11000 ))
            reject(err);
        }
        resolve(result);    // Otherwise resolve

      });
    });

    log(results);

    let songs = await Song.find();
    log(songs);

  } catch(e) {
    console.error(e);
  } finally {
    mongoose.disconnect();
  }


})()

无论如何,您将获得相同的结果,表明写入都在继续,并且我们恭敬地忽略"了与重复键"有关的错误,或者称为错误代码11000. 安全处理"是我们期望此类错误并丢弃它们,同时寻找我们可能只想注意的其他错误"的存在.我们还看到其余代码继续执行,并列出了通过执行随后的.find()调用实际插入的所有文档:

At any rate, you get the same result showing that writes are both continued and that we respectfully "ignore" errors that are related to a "duplicate key" or otherwise known as error code 11000. The "safe handling" is that we expect such errors and discard them whilst looking for the presence of "other errors" that we might just want to pay attention to. We also see the rest of the code continues and lists all documents actually inserted by executing a subsequent .find() call:

Mongoose: songs.remove({}, {})
Mongoose: songs.insertMany([ { _id: 1, name: 'something' }, { _id: 2, name: 'something else' }, { _id: 2, name: 'something else entirely' }, { _id: 3, name: 'another thing' } ], { ordered: false })
Has Write Errors:
[
  {
    "code": 11000,
    "index": 2,
    "errmsg": "E11000 duplicate key error collection: test.songs index: _id_ dup key: { : 2 }",
    "op": {
      "_id": 2,
      "name": "something else entirely"
    }
  }
]
{
  "ok": 1,
  "writeErrors": [
    {
      "code": 11000,
      "index": 2,
      "errmsg": "E11000 duplicate key error collection: test.songs index: _id_ dup key: { : 2 }",
      "op": {
        "_id": 2,
        "name": "something else entirely"
      }
    }
  ],
  "writeConcernErrors": [],
  "insertedIds": [
    {
      "index": 0,
      "_id": 1
    },
    {
      "index": 1,
      "_id": 2
    },
    {
      "index": 2,
      "_id": 2
    },
    {
      "index": 3,
      "_id": 3
    }
  ],
  "nInserted": 3,
  "nUpserted": 0,
  "nMatched": 0,
  "nModified": 0,
  "nRemoved": 0,
  "upserted": [],
  "lastOp": {
    "ts": "6485492726828630028",
    "t": 23
  }
}
Mongoose: songs.find({}, { fields: {} })
[
  {
    "_id": 1,
    "name": "something"
  },
  {
    "_id": 2,
    "name": "something else"
  },
  {
    "_id": 3,
    "name": "another thing"
  }
]

那为什么要这样呢?原因是底层调用实际上返回了errresult,如回调实现所示,但是返回的内容不一致.这样做的主要原因是,您实际上看到的是结果",它不仅具有成功执行操作的结果,而且还包含错误消息.

So why this process? The reason being that the underlying call actually returns both the err and result as shown in the callback implementation but there is an inconsistency in what is returned. The main reason to do this is so you actually see the "result", which not only has the result of the successful operation, but also the error message.

连同错误信息一起是nInserted: 3,它指示实际已写入批处理"中的多少.您可以在这里几乎忽略insertedIds,因为此特定测试实际上涉及提供_id值.如果其他属性具有导致错误的唯一"约束,则此处唯一的值将是来自实际成功写入的值.有点误导,但易于测试和亲自查看.

Along with the error information is the nInserted: 3 indicating how many out of the "batch" actually were written. You can pretty much ignore the insertedIds here since this particular test involved actually supplying _id values. In the event where a different property had the "unique" constraint that caused the error, then the only values here would be those from actual successful writes. A bit misleading, but easy to test and see for yourself.

如前所述,渔获物是无力",可以用另一个示例来证明(async/await仅出于简短起见):

As stated, the catch is the "incosistency" which can be demonstrated with another example ( async/await only for brevity of listing):

const mongoose = require('mongoose'),
      Schema = mongoose.Schema;

mongoose.Promise = global.Promise;
mongoose.set('debug',true);

const uri = 'mongodb://localhost/test',
      options = { useMongoClient: true };

const songSchema = new Schema({
  _id: Number,
  name: String
});

const Song = mongoose.model('Song', songSchema);

function log(data) {
  console.log(JSON.stringify(data, undefined, 2))
}

let docs = [
  { _id: 1, name: "something" },
  { _id: 2, name: "something else" },
  { _id: 2, name: "something else entirely" },
  { _id: 3, name: "another thing" },
  { _id: 4, name: "different thing" },
  //{ _id: 4, name: "different thing again" }
];

(async function() {

  try {
    const conn = await mongoose.connect(uri,options);

    await Song.remove();

    try {
      let results = await Song.insertMany(docs,{ ordered: false });
      console.log('what? no result!');
      log(results);   // not going to get here
    } catch(e) {
      // Log something for the sake of it
      console.log('Has write Errors:');

      // Check to see if something else other than a duplicate key, then throw
      // Branching because MongoError is not consistent
      if (e.hasOwnProperty('writeErrors')) {
        log(e.writeErrors);
        if(e.writeErrors.some( error => error.code !== 11000 ))
          throw e;
      } else if (e.code !== 11000) {
        throw e;
      } else {
        log(e);
      }

    }

    let songs = await Song.find();
    log(songs);

  } catch(e) {
    console.error(e);
  } finally {
    mongoose.disconnect();
  }


})()

几乎都是一样的,但是要注意错误在这里的记录方式:

All much the same thing, but pay attention to how the error logs here:

Has write Errors:
{
  "code": 11000,
  "index": 2,
  "errmsg": "E11000 duplicate key error collection: test.songs index: _id_ dup key: { : 2 }",
  "op": {
    "__v": 0,
    "_id": 2,
    "name": "something else entirely"
  }
}

请注意,即使我们通过执行后续的.find()并获取输出来获得相同的列表延续,也没有成功"信息.这是因为该实现仅对拒绝中的抛出错误"起作用,而从不通过实际的result部分.因此,即使我们要求输入ordered: false,我们也不会获得有关已完成操作的信息,除非我们包装回调并自己实现逻辑(如初始清单所示).

Note that there is no "success" information, even though we get the same continuation of the listing by doing the subsequent .find() and getting the output. This is because the implementation only acts on the "thrown error" in rejection and never passes through the actual result part. So even though we asked for ordered: false, we don't get the information about what was completed unless we wrap the callback and implement the logic ourselves, as is shown in the initial listings.

当存在多个错误"时,发生另一个重要的不一致".因此,取消注释_id: 4的附加值将为我们提供:

The other important "inconsistency" happens when there is "more than one error". So uncommenting the additional value for _id: 4 gives us:

Has write Errors:
[
  {
    "code": 11000,
    "index": 2,
    "errmsg": "E11000 duplicate key error collection: test.songs index: _id_ dup key: { : 2 }",
    "op": {
      "__v": 0,
      "_id": 2,
      "name": "something else entirely"
    }
  },
  {
    "code": 11000,
    "index": 5,
    "errmsg": "E11000 duplicate key error collection: test.songs index: _id_ dup key: { : 4 }",
    "op": {
      "__v": 0,
      "_id": 4,
      "name": "different thing again"
    }
  }
]

在这里您可以看到e.writeErrors出现的代码分支",当出现一个错误时该代码不存在.相比之下,较早的response对象具有hasWriteErrors()getWriteErrors()方法,无论是否存在任何错误.因此,这是更一致的界面,也是您应该使用它而不是仅检查err响应的原因.

Here you can see the code "branched" on the presence of e.writeErrors, which does not exist when there is one error. By contrast the earlier response object has both the hasWriteErrors() and getWriteErrors() methods, regardless of any error being present at all. So that is the more consistent interface and the reason why you should use it instead of inspecting the err response alone.

此行为实际上已在即将发布的驱动程序3.x发行版中修复,该发行版旨在与MongoDB 3.6服务器发行版一致.行为发生了变化,因为err响应更类似于标准result,但是当然被归类为BulkWriteError响应,而不是目前的MongoError.

This behavior is actually fixed in the upcoming 3.x release of the driver which is meant to coincide with the MongoDB 3.6 server release. The behavior changes in that the err response is more akin to the standard result, but of course classed as a BulkWriteError response instead of MongoError which it presently is.

在发布之前(当然,直到依赖项和更改传播到猫鼬"实现中为止),然后建议采取的措施是要知道有用的信息在result中.不是 err.实际上,您的代码可能应该在result中查找hasErrors(),然后回退以检查err,以迎合要在驱动程序中实现的更改.

Until that is released ( and of course until that dependency and changes are propagated to the "mongoose" implementation ), then the recommended course of action is to be aware that the useful information is in the result and not the err. In fact your code probably should look for hasErrors() in the result and then fallback to check err as well, in order to cater for the change to be implemented in the driver.

作者注:实际上,这里的许多内容和相关阅读都已经在函数insertMany()上得到了答复. MongoDB Node.js本机驱动程序无声地吞下bulkWrite异常.但是,在这里重复并详细说明,直到最终使人们意识到这是您在当前驱动程序实现中处理异常的方式.当您寻找正确的位置并编写相应的代码来处理它时,它确实可以正常工作.

Authors Note: Much of this content and related reading is actually already answered here on Function insertMany() unordered: proper way to get both the errors and the result? and MongoDB Node.js native driver silently swallows bulkWrite exception. But repeating and elaborating here until it finally sinks in to people that this is the way you handle exceptions in the current driver implementation. And it does actually work, when you look in the correct place and write your code to handle it accordingly.

这篇关于insertMany处理重复错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆