MongoDB 优化多个 find_one + insert inside 循环 [英] MongoDB optimize multiple find_one + insert inside loop

查看:47
本文介绍了MongoDB 优化多个 find_one + insert inside 循环的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将 MongoDB 4.0.1 和 Pymongo 与 pyhton 3.5 一起使用.我必须每 30 - 60 秒循环超过 12000 个项目并将新数据添加到 MongoDB.在这个例子中,我们将讨论用户、宠物和汽车.用户可以获得 1 辆汽车和 1 只宠物.

I'm using MongoDB 4.0.1 and Pymongo with pyhton 3.5. I have to loop over 12000 items every 30 - 60 seconds and add new data into MongoDB. For this example we will talk about User, Pet and Car. The User can get 1 Car and 1 Pet.

我需要宠物 ObjectID 和汽车 ObjectID 来创建我的用户,所以我必须在循环中一一添加它们,这非常慢.查找现有数据并在数据不存在时添加它们大约需要 25 秒.

I need the pet ObjectID and the car ObjectID to create my User so I have to add them one by one in the loop and this is very slow. It takes ~25 seconds to find existing data and add them if the data not exist.

while dictionary != False:
    # Create pet if not exist
    existing_pet = pet.find_one({"code": dictionary['pet_code']})

    if bool(existing_pet):
        pet_id = existing_pet['_id']
    else:
        pet_id = pet.insert({
            "code" : dictionary['pet_code'],
            "name" : dictionary['name']
        })
        # Call web service to create pet remote

    # Create car if not exist
    existing_car = car.find_one({"platenumber": dictionary['platenumber']})

    if bool(existing_car):
        car_id = existing_car['_id']
    else:
        car_id = car.insert({
            "platenumber" : dictionary['platenumber'],
            "model" : dictionary['model'],
            "energy" : 'electric'
        })
        # Call web service to create car remote

    # Create user if not exist
    existing_user = user.find_one(
        {"$and": [
            {"user_code": dictionary['user_code']},
            {"car": car_id},
            {"pet": pet_id}
        ]}
    )

    if not bool(existing_user):
        user_data.append({
            "pet" : pet_id,
            "car" : car_id,
            "firstname" : dictionary['firstname'],
            "lastname" : dictionary['lastname']
        })
        # Call web service to create user remote

# Bulk insert user
if user_data:
    user.insert_many(user_data)

我为用于 find_one 的每一列创建了索引:

I created indexes for each column used for the find_one :

db.user.createIndex( { user_code: 1 } )
db.user.createIndex( { pet: 1 } )
db.user.createIndex( { car: 1 } )
db.pet.createIndex( { pet_code: 1 }, { unique: true }  )
db.car.createIndex( { platenumber: 1 }, { unique: true }  )

有办法加速这个循环吗?有聚合或其他东西可以帮助我吗?或者也许是另一种方式来做我想做的事?

There is a way to speed up this loop ? There is something with aggregation or other things to help me ? Or maybe another way to do what I want ?

我愿意接受所有建议.

推荐答案

不要做 12000 次 find_one 查询,做 1 次查询以使用 $in 运算符将所有存在的内容都带回来.代码类似于:

Don´t do 12000 find_one queries, do 1 query to bring all that exist with $in operator. Code would be something like:

pet_codes = []
pet_names = []
while dictionary != False:
    pet_codes.append(dictionary['pet_code'])
    pet_names.append(dictionary['pet_name'])

pets = dict()
for pet in pet.find({"code": {$in: pet_codes}}):
    pets[pet['code']] = pet

new_pets = []
for code, name in zip(pet_codes, pet_names):
    if code not in pets:
        new_pets.add({'pet_code': code, 'name': name})

pet.insert_many(new_pets)

由于您已经在 pet_code 上建立了唯一索引,因此我们可以做得更好:尝试将它们全部插入,因为如果我们尝试插入现有记录,该记录将出错,但其余的将通过使用成功文档:

As you already have an index on pet_code making it unique, we can do better: just try to insert them all, because if we try to insert an existing one that record will get an error, but the rest will succeed by using the ordered=False from the docs:

new_pets = []
while dictionary != False:
    new_pets.add({
        "code" : dictionary['pet_code'],
        "name" : dictionary['name']
    })
pet.insert_many(new_pets, ordered=False)

在没有唯一限制集的情况下,另一种方法是 批处理操作

In the case where you do not have a unique restriction set, another method is batching the operations

这篇关于MongoDB 优化多个 find_one + insert inside 循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆