pymongo中的快速或批量Upsert [英] Fast or Bulk Upsert in pymongo

查看:618
本文介绍了pymongo中的快速或批量Upsert的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在pymongo中进行批量增补?我想更新一堆条目,但一次只能执行一次.

How can I do a bulk upsert in pymongo? I want to Update a bunch of entries and doing them one at a time is very slow.

一个几乎相同的问题的答案在这里:在MongoDB中批量更新/更新?

The answer to an almost identical question is here: Bulk update/upsert in MongoDB?

接受的答案实际上并未回答问题.它只是提供了用于执行导入/导出的mongo CLI的链接.

The accepted answer doesn't actually answer the question. It simply gives a link to the mongo CLI for doing import/exports.

我也愿意向某人解释为什么无法进行批量追加/没有最佳实践,但是请说明解决此类问题的首选解决方案.

I would also be open to someone explaining why doing a bulk upsert is no possible / no a best practice, but please explain what the preferred solution to this sort of problem is.

推荐答案

现代版本的pymongo(大于3.x)将批量操作包装在一个一致的接口中,该接口降级了服务器版本不支持批量操作的位置.现在,这在MongoDB官方支持的驱动程序中是一致的.

Modern releases of pymongo ( greater than 3.x ) wrap bulk operations in a consistent interface that downgrades where the server release does not support bulk operations. This is now consistent in MongoDB officially supported drivers.

因此,编码的首选方法是使用 UpdateOne 其他其他适当的操作代替.而且现在当然,首选使用自然语言列表,而不是使用特定的构建器

So the preferred method for coding is to use bulk_write() instead, where you use an UpdateOne other other appropriate operation action instead. And now of course it is preferred to use the natural language lists rather than a specific builder

旧文献的直接翻译:

from pymongo import UpdateOne

operations = [
    UpdateOne({ "field1": 1},{ "$push": { "vals": 1 } },upsert=True),
    UpdateOne({ "field1": 1},{ "$push": { "vals": 2 } },upsert=True),
    UpdateOne({ "field1": 1},{ "$push": { "vals": 3 } },upsert=True)
]

result = collection.bulk_write(operations)

或者经典的文档转换循环:

Or the classic document transformation loop:

import random
from pymongo import UpdateOne

random.seed()

operations = []

for doc in collection.find():
    # Set a random number on every document update
    operations.append(
        UpdateOne({ "_id": doc["_id"] },{ "$set": { "random": random.randint(0,10) } })
    )

    # Send once every 1000 in batch
    if ( len(operations) == 1000 ):
        collection.bulk_write(operations,ordered=False)
        operations = []

if ( len(operations) > 0 ):
    collection.bulk_write(operations,ordered=False)

返回的结果为 BulkWriteResult ,其中将包含匹配的文档和更新的文档的计数器,以及发生的任何"upserts"的返回的_id值.

对批量操作数组的大小有一些误解.发送到服务器的实际请求不能超过16MB BSON限制,因为该限制也适用于发送到使用BSON格式的服务器的请求".

There is a bit of a misconception about the size of the bulk operations array. The actual request as sent to the server cannot exceed the 16MB BSON limit since that limit also applies to the "request" sent to the server which is using BSON format as well.

但是,这并不决定您可以构建的请求数组的大小,因为实际的操作无论如何都只会以1000个批次发送和处理.唯一真正的限制是,这1000条操作指令本身实际上并未创建大于16MB的BSON文档.这确实是一个很高的要求.

However that does not govern the size of the request array that you can build, as the actual operations will only be sent and processed in batches of 1000 anyway. The only real restriction is that those 1000 operation instructions themselves do not actually create a BSON document greater than 16MB. Which is indeed a pretty tall order.

批量方法的一般概念是减少通信量",这是因为一次发送许多东西并且仅处理一个服务器响应.减少每个更新请求的开销可以节省大量时间.

The general concept of bulk methods is "less traffic", as a result of sending many things at once and only dealing with one server response. The reduction of that overhead attached to every single update request saves lots of time.

这篇关于pymongo中的快速或批量Upsert的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆