加快Pymongo查询 [英] Speed up Pymongo query

查看:113
本文介绍了加快Pymongo查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在将Python 3,Pymongo与mongodb 4.0和 Ifxpy 一起使用来查询Informix数据库.我的MongoDB数据库中有4个Collections:

I'm using Python 3, Pymongo with mongodb 4.0 and Ifxpy to query Informix database. I have 4 Collections in my MongoDB database :

  • 用户
  • 办公室
  • 宠物
  • 汽车

一个用户有一个办公室,一个宠物和一辆汽车.因此,我在User集合的每个字段上都有3个引用.

One User has one Office, one Pet and one Car. So I have 3 references on each field in the User collections.

我需要这样的东西:

  • 我想查找是否有一个名为John的用户和一个名为Mickey的Pet以及型号为Tesla且状态为inactive的汽车.之后,我将把用户状态更新为active.我必须查询Office,但在此示例中不使用它.
  • I want to find if there is a User with the name John and a Pet with the name Mickey and a Car with the model Tesla and with the status inactive. After that I'll just update the User status to active. I have to query the Office but I don't use it in this example.

我为每个字段创建了索引:

I created indexes for each fields :

office.create_index([("code", pymongo.DESCENDING)], unique=True)
pet.create_index([("name", pymongo.DESCENDING)], unique=True)
car.create_index([("model", pymongo.DESCENDING)], unique=True)
user.create_index([("username", pymongo.DESCENDING)], unique=True)
user.create_index([("pet", pymongo.DESCENDING)])
user.create_index([("car", pymongo.DESCENDING)])
user.create_index([("status", pymongo.DESCENDING)])

这是我的代码:

office_id = None
car_id = None
pet_id = None
ifx_connection = IfxPy.connect(ifx_param, "", "")
stmt = IfxPy.exec_immediate(ifx_connection, sql)
dictionary = IfxPy.fetch_assoc(stmt) # Get data key / value
start = time.time()

# Loop on informix data (20 000 items)
while dictionary != False:
    # Trim all string in dict
    dictionary = {k: v.strip() if isinstance(v, str) else v for k,v in dictionary.items()}

    # Get office
    office_code = dictionary['office_code']
    existing_office = office.find_one({"code": office_code})

    if bool(existing_office):
        office_id = existing_office['_id']

    # Get pet
    existing_pet = pet.find_one({"name": dictionary['pet_name']})
    if bool(existing_pet):
        pet_id = existing_pet['_id']

    # Get car
    existing_car = car.find_one({"model": dictionary['car_model']})

    if bool(existing_car):
        car_id = existing_car['_id']

    # Get user
    existing_user = user.find_one({
        "username": dictionary['username'],
        "car": car_id,
        "pet": pet_id,
        "status" : "inactive"
    })

    if bool(existing_user):
        # Change user status
        user.update_one({'_id': existing_user['_id']}, {"$set": {"status" : "active"}}, upsert=False)

    # Next row
    dictionary = IfxPy.fetch_assoc(stmt)

如果我从循环中删除MongoDB代码,则需要1.33秒.而且,如果我查询MongoDB,则需要47秒.我有2万个物品.我认为这真的很慢.

If I remove the MongoDB code from the loop, it takes 1,33 seconds. And If I query MongoDB it takes 47 seconds. I have 20 000 items. I think it's really slow.

我尝试通过删除所有find_one并只设置一个来查看使用start = time.time() 的每个find_one的时间.如果我只让Office find_one花费大约12秒钟,而另一个则相同.如果我只让客户find_one,它也需要约12秒的时间.所以〜12 * 4这就是为什么所有find_one都需要〜47秒的时间.

I tried to see the time for each find_one with start = time.time() by removing all find_one and let just one. And if I let just Office find_one it takes ~12 seconds and same for the other. If I let just customer find_one it takes ~12 seconds too. So ~12 * 4 this is why it takes ~47 seconds for all find_one.

你能告诉我我在做什么错吗?

Can you tell me what I'm doing wrong ?

推荐答案

要加快该算法的速度,您需要减少对MongoDB查询的数量,这可以通过利用对数据的了解来完成.所以,如果你知道您只有几个不同的办公室,或者如果您可能在某个阶段(或一遍又一遍地)查询所有办公室,那么您可能希望在循环外的一个初步步骤中加载所有办公室(!!!),并使用字典将它们缓存起来,以便在循环内快速查找,而无需另一个数据库往返.宠物和汽车也一样.

To speed up that algorithm you'd need to reduce the number of MongoDB queries which you can do by leveraging what you know about your data. So, if you e.g. know that you only have a few distinct offices or if you are likely to query all of them anyway at some stage (or the same ones over and over) then you'd probably want to load all offices in one preliminary step outside of the loop (!!!) and cache them using a dictionary for a fast lookup inside the loop without requiring another database round-trip. The same for pets and cars.

因此,更准确地说,我会:

So, more precisely, I would:

  1. 以您已经执行的方式运行notifyix查询
  2. 使用三个预先查询来检索所有办公室,宠物和汽车.如果要优化此阶段,则只检索informix数据集中各个列的不同值中包含的值.另外,请确保添加投影以仅在输出中包含name/model/code + _id字段,以减少涉及的BSON处理量
  3. 将返回值放入三个字典(name-> _idmodel-> _idcode-> _id)
  4. 以您已经使用的方式遍历informix结果集
  5. 对于您的notifyix结果集中的每个用户,将更新模型追加到列表中,该列表的选择标准由先前收集的所有详细信息组成,并且更新部分为静态
  6. 在循环外部(之后)使用批量更新以更新所有用户
  1. run the informix query the way you do it already
  2. retrieve all offices, pets and cars using three upfront queries. If you want to optimize this stage you'd only retrieve the ones that are contained in the distinct values of the respective column in your informix dataset. Also make sure you add a projection to only include the name/model/code + the _id fields in the output in order to reduce the amount of BSON processing involved
  3. put the returned values into three dictionaries (name->_id, model->_id, code->_id)
  4. loop through the informix result set the way you do it already
  5. for every user in your informix result set append an update model to a list where the selection criterion is made up of all the previously gathered details and the update part is static
  6. outside (after) the loop use a bulk update to update all users

这篇关于加快Pymongo查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆