pymongo 中的快速或批量 Upsert [英] Fast or Bulk Upsert in pymongo

查看:86
本文介绍了pymongo 中的快速或批量 Upsert的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在 pymongo 中进行批量更新?我想更新一堆条目并且一次执行一个非常慢.

How can I do a bulk upsert in pymongo? I want to Update a bunch of entries and doing them one at a time is very slow.

这里有一个几乎相同问题的答案:MongoDB 中的批量更新/upsert?

The answer to an almost identical question is here: Bulk update/upsert in MongoDB?

接受的答案实际上并未回答问题.它只是提供了一个链接到 mongo CLI 以进行导入/导出.

The accepted answer doesn't actually answer the question. It simply gives a link to the mongo CLI for doing import/exports.

我也愿意向其他人解释为什么无法进行批量更新插入/不是最佳实践,但请解释此类问题的首选解决方案是什么.

I would also be open to someone explaining why doing a bulk upsert is no possible / no a best practice, but please explain what the preferred solution to this sort of problem is.

推荐答案

现代版本的 pymongo(大于 3.x)将批量操作包装在一致的界面中,在服务器版本不支持批量操作的情况下降级.这在 MongoDB 官方支持的驱动程序中现在是一致的.

Modern releases of pymongo ( greater than 3.x ) wrap bulk operations in a consistent interface that downgrades where the server release does not support bulk operations. This is now consistent in MongoDB officially supported drivers.

所以编码的首选方法是使用 bulk_write() 代替,您使用 UpdateOne 其他其他适当的操作动作.现在当然更喜欢使用自然语言列表而不是特定的构建器

So the preferred method for coding is to use bulk_write() instead, where you use an UpdateOne other other appropriate operation action instead. And now of course it is preferred to use the natural language lists rather than a specific builder

旧文档的直接翻译:

from pymongo import UpdateOne

operations = [
    UpdateOne({ "field1": 1},{ "$push": { "vals": 1 } },upsert=True),
    UpdateOne({ "field1": 1},{ "$push": { "vals": 2 } },upsert=True),
    UpdateOne({ "field1": 1},{ "$push": { "vals": 3 } },upsert=True)
]

result = collection.bulk_write(operations)

或者经典的文档转换循环:

Or the classic document transformation loop:

import random
from pymongo import UpdateOne

random.seed()

operations = []

for doc in collection.find():
    # Set a random number on every document update
    operations.append(
        UpdateOne({ "_id": doc["_id"] },{ "$set": { "random": random.randint(0,10) } })
    )

    # Send once every 1000 in batch
    if ( len(operations) == 1000 ):
        collection.bulk_write(operations,ordered=False)
        operations = []

if ( len(operations) > 0 ):
    collection.bulk_write(operations,ordered=False)

返回结果为BulkWriteResult 将包含匹配和更新文档的计数器以及发生的任何更新插入"的返回 _id 值.

关于批量操作数组的大小存在一些误解.发送到服务器的实际请求不能超过 16MB BSON 限制,因为该限制也适用于发送到使用 BSON 格式的服务器的请求".

There is a bit of a misconception about the size of the bulk operations array. The actual request as sent to the server cannot exceed the 16MB BSON limit since that limit also applies to the "request" sent to the server which is using BSON format as well.

然而,这并不能控制您可以构建的请求数组的大小,因为无论如何,实际操作只会以 1000 个为一组发送和处理.唯一真正的限制是这 1000 条操作指令本身实际上并没有创建大于 16MB 的 BSON 文档.这确实是一个相当高的要求.

However that does not govern the size of the request array that you can build, as the actual operations will only be sent and processed in batches of 1000 anyway. The only real restriction is that those 1000 operation instructions themselves do not actually create a BSON document greater than 16MB. Which is indeed a pretty tall order.

批量方法的一般概念是更少的流量",因为一次发送很多东西并且只处理一个服务器响应.减少附加到每个更新请求的开销可以节省大量时间.

The general concept of bulk methods is "less traffic", as a result of sending many things at once and only dealing with one server response. The reduction of that overhead attached to every single update request saves lots of time.

这篇关于pymongo 中的快速或批量 Upsert的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆