为什么 psycopg2 INSERT 在循环中运行需要这么长时间,我该如何加快速度? [英] Why is psycopg2 INSERT taking so long to run in a loop and how do I speed it up?

查看:93
本文介绍了为什么 psycopg2 INSERT 在循环中运行需要这么长时间,我该如何加快速度?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在 for 循环中使用 psycopg2 INSERT 将 Pandas 数据帧中的(source_lat、source_long、destination_lat、destination_long)行插入到 PostgreSQL 表 (gmaps) 中.该表有一个完整性约束,可防止插入重复(source_lat、source_long、destination_lat、destination_long)行,因此我使用 try except 块捕获任何重复项.我的代码如下.

I am trying to insert (source_lat, source_long, destination_lat, destination_long) rows from a Pandas dataframe into a PostgreSQL table (gmaps) using psycopg2 INSERT in a for loop. The table has an integrity constraint that prevents duplicate (source_lat, source_long, destination_lat, destination_long) rows from being inserted, so I am catching any duplicates with a try except block. My code is below.

我正在遍历数据帧中的每一行(大约 100000 行)并在每一行上调用 cursor.execute(INSERT),看看是否会引发完整性错误,如果没有,我会在 gmaps 中插入该行桌子.

I am iterating through every row in the dataframe (about 100000 rows) and calling cursor.execute(INSERT) on each row, seeing if that throws an integrity error, if it doesn't, I insert that row in the gmaps table.

然而,这段代码需要很长时间才能运行——我怎样才能加快它的速度?我不确定开销在哪里?谢谢!

However, this piece of code takes forever to run - how could I speed it up? I'm not sure where the overhead lies? Thank you!

Ele 是一个包含 (source_lat, source_long, destination_lat, destination_long) 的元组

Ele is a tuple that holds (source_lat, source_long, destination_lat, destination_long)

for ele in coordinates:
#Inserts new row to table
      try:
         cursor.execute('INSERT INTO gmaps (source_latitude, source_longitude, destination_latitude, destination_longitude) VALUES (%s, %s, %s, %s)', (ele[0], ele[1], ele[2], ele[3])))
      except psycopg2.IntegrityError:
         conn.rollback()
      else:
         conn.commit()

推荐答案

有多种选项可以加快批量数据的插入速度.

There a multiple options to speed up inserting bulk data.

1.) commit() 循环结束后:

1.) commit() after the loop is finished:

for ele in coordinates:
    cursor.execute('INSERT INTO gmaps (source_latitude, source_longitude, destination_latitude, destination_longitude) VALUES (%s, %s, %s, %s)', (ele[0], ele[1], ele[2], ele[3])))
conn.commit()

2.) 使用 psycopg2 的 快速执行助手,像 execute_batch() 或 execute_values().

2.) Use psycopg2's fast execution helpers, like execute_batch() or execute_values().

3.) 使用 mogrify() 的字符串集中:

3.) String concentraction using mogrify():

dataText = ','.join(cur.mogrify('(%s,%s,%s,%s)', row) for ele in coordinates)
cur.execute('INSERT INTO gmaps VALUES ' + dataText)
cur.commit()

有关 INSERT 执行速度的详细比较,请查看 this 基准.

For a detailed comparison of INSERT execution speeds have a look at this benchmark.

这篇关于为什么 psycopg2 INSERT 在循环中运行需要这么长时间,我该如何加快速度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆