如何在python中使用单个进程缩放psycopg2插入和选择? [英] How to scale psycopg2 insert and select with single process in python?

查看:56
本文介绍了如何在python中使用单个进程缩放psycopg2插入和选择?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的插入内容平均需要大约0.300095081329才能完成对postgres的提交.

It takes average of about 0.300095081329 for my insert to go through to finish commit to postgres.

这是我的餐桌模式

id_table
    latest_update_id (primary index)
    product_id       (index)
    publish_date

product_meta_table
    latest_update_id    (index)
    product_id          (index)
    meta_related_info1
    meta_related_info2
    ...etc

product_table
    latest_update_id    (index)
    product_id          (index)
    note_related_info1
    note_related_info2
    ....etc

这是我的一些插入

db_cursor.execute("INSERT INTO id_table (product_id, publish_date)  \
             VALUES (%s, %s) RETURNING latest_update_id",
    (my_dict["product_id"], my_dict["publish_date"])
)

 db_cursor.execute("INSERT INTO product_table ( \
                   latest_update_id, \
                   product_id, \
                   note_related_info1, \
                   note_related_info2, \
                   ...etc)  \
             VALUES (%s, %s, %s, %s) RETURNING *",
    (my_dict["latest_update_id"], 
     my_dict["product_id"],
     my_dict["note_related_info1"],
     my_dict["note_related_info2"])
)       

使用插入时间,我的吞吐量约为1/0.3 = 3qps

Using the insert time my throughput is about 1/0.3 = 3qps

我知道我可以通过添加更多实例来水平扩展,但是我想尝试看看是否至少可以达到3000qps.

I know I can scale this horizontally by adding more instances but I want to try to see if I can hit at least 3000qps.

我正在考虑使用aync或线程,但是不确定GIL是否会干扰.

I am thinking of either using aync or threading, but was not sure of GIL is going to interfere or not.

关于如何使用psycopg2缩放insert语句是否有通用的良好实践和技术?

Is there a general good practice and technique on how to scale insert statements using psycopg2?

谢谢

注意:我正在使用python 2.7

Note: I am using python 2.7

注意:python进程正在通过https与sql server通信

Note: python process is communicating with sql server through https

注意:每个表的插入是交错的,table2在table1之后插入,table3在table2之后插入.从技术上讲,table2和table3只需要等待table1完成插入,因为它们需要latest_update_id

Note: the inserts to each table are staggered, table2 inserts after table1, table3 inserts after table2. Technically table2 and table3 only have to wait for table1 to finish insert because they need latest_update_id

推荐答案

在而不是在3中执行单个插入查询.请注意传递的三引号和字典参数:

Do a single insert query in instead of 3. Notice the triple quotes and dictionary parameter passing:

insert_query = """
    with i as (
        insert into id_table (product_id, publish_date) 
        values (%(product_id)s, %(publish_date)s)
        returning latest_update_id
    )
    insert into product_table (
        latest_update_id,
        product_id,
        note_related_info1,
        note_related_info2
    ) values (
        (select latest_update_id from i),
        %(product_id)s, %(note_related_info1)s, %(note_related_info2)s
    )
    returning *
"""

db_cursor.execute(insert_query, my_dict)

这篇关于如何在python中使用单个进程缩放psycopg2插入和选择?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆