如何使用insert_many安全地忽略重复的键错误 [英] How to Ignore Duplicate Key Errors Safely Using insert_many

查看:325
本文介绍了如何使用insert_many安全地忽略重复的键错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当将insert_many与pymongo一起使用时,我需要忽略重复的插入,其中重复是基于索引的.我已经在stackoverflow上看到了这个问题,但是我没有看到有用的答案.

I need to ignore duplicate inserts when using insert_many with pymongo, where the duplicates are based on the index. I've seen this question asked on stackoverflow, but I haven't seen a useful answer.

这是我的代码段:

try:
    results = mongo_connection[db][collection].insert_many(documents, ordered=False, bypass_document_validation=True)
except pymongo.errors.BulkWriteError as e:
    logger.error(e)

我希望insert_many忽略重复项而不抛出异常(这会填满我的错误日志).另外,是否可以使用一个单独的异常处理程序,这样我就可以忽略这些错误.我想念"w = 0" ...

I would like the insert_many to ignore duplicates and not throw an exception (which fills up my error logs). Alternatively, is there a separate exception handler I could use, so that I can just ignore the errors. I miss "w=0"...

谢谢

推荐答案

您可以通过检查BulkWriteError产生的错误来解决此问题.这实际上是一个具有多个属性的对象".有趣的部分在details:

You can deal with this by inspecting the errors produced with BulkWriteError. This is actually an "object" which has several properties. The interesting parts are in details:

import pymongo
from bson.json_util import dumps
from pymongo import MongoClient
client = MongoClient()
db = client.test

collection = db.duptest

docs = [{ '_id': 1 }, { '_id': 1 },{ '_id': 2 }]


try:
  result = collection.insert_many(docs,ordered=False)

except pymongo.errors.BulkWriteError as e:
  print e.details['writeErrors']

在第一次运行时,这将给出e.details['writeErrors']下的错误列表:

On a first run, this will give the list of errors under e.details['writeErrors']:

[
  { 
    'index': 1,
    'code': 11000, 
    'errmsg': u'E11000 duplicate key error collection: test.duptest index: _id_ dup key: { : 1 }', 
    'op': {'_id': 1}
  }
]

第二次运行时,您会看到三个错误,因为所有项目都存在:

On a second run, you see three errors because all items existed:

[
  {
    "index": 0,
    "code": 11000,
    "errmsg": "E11000 duplicate key error collection: test.duptest index: _id_ dup key: { : 1 }", 
    "op": {"_id": 1}
   }, 
   {
     "index": 1,
     "code": 11000,
     "errmsg": "E11000 duplicate key error collection: test.duptest index: _id_ dup key: { : 1 }",
     "op": {"_id": 1}
   },
   {
     "index": 2,
     "code": 11000,
     "errmsg": "E11000 duplicate key error collection: test.duptest index: _id_ dup key: { : 2 }",
     "op": {"_id": 2}
   }
]

因此,您所需要做的就是用"code": 11000过滤数组中的条目,然后在其中有其他内容时仅恐慌"

So all you need do is filter the array for entries with "code": 11000 and then only "panic" when something else is in there

panic = filter(lambda x: x['code'] != 11000, e.details['writeErrors'])

if len(panic) > 0:
  print "really panic"

这为您提供了一种机制,可以忽略重复的键错误,但是当然要注意确实存在问题的事物.

That gives you a mechanism for ignoring the duplicate key errors but of course paying attention to something that is actually a problem.

这篇关于如何使用insert_many安全地忽略重复的键错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆