为什么 psycopg2 INSERT 在循环中运行需要这么长时间,我该如何加快速度? [英] Why is psycopg2 INSERT taking so long to run in a loop and how do I speed it up?
问题描述
我正在尝试在 for 循环中使用 psycopg2 INSERT 将 Pandas 数据帧中的(source_lat、source_long、destination_lat、destination_long)行插入到 PostgreSQL 表 (gmaps) 中.该表有一个完整性约束,可防止插入重复(source_lat、source_long、destination_lat、destination_long)行,因此我使用 try except 块捕获任何重复项.我的代码如下.
I am trying to insert (source_lat, source_long, destination_lat, destination_long) rows from a Pandas dataframe into a PostgreSQL table (gmaps) using psycopg2 INSERT in a for loop. The table has an integrity constraint that prevents duplicate (source_lat, source_long, destination_lat, destination_long) rows from being inserted, so I am catching any duplicates with a try except block. My code is below.
我正在遍历数据帧中的每一行(大约 100000 行)并在每一行上调用 cursor.execute(INSERT),看看是否会引发完整性错误,如果没有,我会在 gmaps 中插入该行桌子.
I am iterating through every row in the dataframe (about 100000 rows) and calling cursor.execute(INSERT) on each row, seeing if that throws an integrity error, if it doesn't, I insert that row in the gmaps table.
然而,这段代码需要很长时间才能运行——我怎样才能加快它的速度?我不确定开销在哪里?谢谢!
However, this piece of code takes forever to run - how could I speed it up? I'm not sure where the overhead lies? Thank you!
Ele 是一个包含 (source_lat, source_long, destination_lat, destination_long) 的元组
Ele is a tuple that holds (source_lat, source_long, destination_lat, destination_long)
for ele in coordinates:
#Inserts new row to table
try:
cursor.execute('INSERT INTO gmaps (source_latitude, source_longitude, destination_latitude, destination_longitude) VALUES (%s, %s, %s, %s)', (ele[0], ele[1], ele[2], ele[3])))
except psycopg2.IntegrityError:
conn.rollback()
else:
conn.commit()
推荐答案
有多种选项可以加快批量数据的插入速度.
There a multiple options to speed up inserting bulk data.
1.) commit()
循环结束后:
1.) commit()
after the loop is finished:
for ele in coordinates:
cursor.execute('INSERT INTO gmaps (source_latitude, source_longitude, destination_latitude, destination_longitude) VALUES (%s, %s, %s, %s)', (ele[0], ele[1], ele[2], ele[3])))
conn.commit()
2.) 使用 psycopg2 的 快速执行助手,像 execute_batch() 或 execute_values()
.
2.) Use psycopg2's fast execution helpers, like execute_batch() or execute_values()
.
3.) 使用 mogrify()
的字符串集中:
3.) String concentraction using mogrify()
:
dataText = ','.join(cur.mogrify('(%s,%s,%s,%s)', row) for ele in coordinates)
cur.execute('INSERT INTO gmaps VALUES ' + dataText)
cur.commit()
有关 INSERT
执行速度的详细比较,请查看 this 基准.
For a detailed comparison of INSERT
execution speeds have a look at this benchmark.
这篇关于为什么 psycopg2 INSERT 在循环中运行需要这么长时间,我该如何加快速度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!