如何在python中使用单个进程缩放psycopg2插入和选择? [英] How to scale psycopg2 insert and select with single process in python?
问题描述
我的插入内容平均需要大约0.300095081329
才能完成对postgres的提交.
It takes average of about 0.300095081329
for my insert to go through to finish commit to postgres.
这是我的餐桌模式
id_table
latest_update_id (primary index)
product_id (index)
publish_date
product_meta_table
latest_update_id (index)
product_id (index)
meta_related_info1
meta_related_info2
...etc
product_table
latest_update_id (index)
product_id (index)
note_related_info1
note_related_info2
....etc
这是我的一些插入
db_cursor.execute("INSERT INTO id_table (product_id, publish_date) \
VALUES (%s, %s) RETURNING latest_update_id",
(my_dict["product_id"], my_dict["publish_date"])
)
db_cursor.execute("INSERT INTO product_table ( \
latest_update_id, \
product_id, \
note_related_info1, \
note_related_info2, \
...etc) \
VALUES (%s, %s, %s, %s) RETURNING *",
(my_dict["latest_update_id"],
my_dict["product_id"],
my_dict["note_related_info1"],
my_dict["note_related_info2"])
)
使用插入时间,我的吞吐量约为1/0.3 = 3qps
Using the insert time my throughput is about 1/0.3 = 3qps
我知道我可以通过添加更多实例来水平扩展,但是我想尝试看看是否至少可以达到3000qps
.
I know I can scale this horizontally by adding more instances but I want to try to see if I can hit at least 3000qps
.
我正在考虑使用aync或线程,但是不确定GIL
是否会干扰.
I am thinking of either using aync or threading, but was not sure of GIL
is going to interfere or not.
关于如何使用psycopg2缩放insert
语句是否有通用的良好实践和技术?
Is there a general good practice and technique on how to scale insert
statements using psycopg2?
谢谢
注意:我正在使用python 2.7
Note: I am using python 2.7
注意:python进程正在通过https与sql server通信
Note: python process is communicating with sql server through https
注意:每个表的插入是交错的,table2在table1之后插入,table3在table2之后插入.从技术上讲,table2和table3只需要等待table1完成插入,因为它们需要latest_update_id
Note: the inserts to each table are staggered, table2 inserts after table1, table3 inserts after table2. Technically table2 and table3 only have to wait for table1 to finish insert because they need latest_update_id
推荐答案
在而不是在3中执行单个插入查询.请注意传递的三引号和字典参数:
Do a single insert query in instead of 3. Notice the triple quotes and dictionary parameter passing:
insert_query = """
with i as (
insert into id_table (product_id, publish_date)
values (%(product_id)s, %(publish_date)s)
returning latest_update_id
)
insert into product_table (
latest_update_id,
product_id,
note_related_info1,
note_related_info2
) values (
(select latest_update_id from i),
%(product_id)s, %(note_related_info1)s, %(note_related_info2)s
)
returning *
"""
db_cursor.execute(insert_query, my_dict)
这篇关于如何在python中使用单个进程缩放psycopg2插入和选择?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!