为什么 psycopg2 INSERT 在循环中运行需要这么长时间，我该如何加快速度? [英] Why is psycopg2 INSERT taking so long to run in a loop and how do I speed it up?

查看：93 发布时间：2021/6/13 20:14:42 python pandas postgresql psycopg2

本文介绍了为什么 psycopg2 INSERT 在循环中运行需要这么长时间，我该如何加快速度?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试在 for 循环中使用 psycopg2 INSERT 将 Pandas 数据帧中的(source_lat、source_long、destination_lat、destination_long)行插入到 PostgreSQL 表 (gmaps) 中.该表有一个完整性约束，可防止插入重复(source_lat、source_long、destination_lat、destination_long)行，因此我使用 try except 块捕获任何重复项.我的代码如下.

I am trying to insert (source_lat, source_long, destination_lat, destination_long) rows from a Pandas dataframe into a PostgreSQL table (gmaps) using psycopg2 INSERT in a for loop. The table has an integrity constraint that prevents duplicate (source_lat, source_long, destination_lat, destination_long) rows from being inserted, so I am catching any duplicates with a try except block. My code is below.

我正在遍历数据帧中的每一行(大约 100000 行)并在每一行上调用 cursor.execute(INSERT)，看看是否会引发完整性错误，如果没有，我会在 gmaps 中插入该行桌子.

I am iterating through every row in the dataframe (about 100000 rows) and calling cursor.execute(INSERT) on each row, seeing if that throws an integrity error, if it doesn't, I insert that row in the gmaps table.

然而，这段代码需要很长时间才能运行——我怎样才能加快它的速度?我不确定开销在哪里?谢谢！

However, this piece of code takes forever to run - how could I speed it up? I'm not sure where the overhead lies? Thank you!

Ele 是一个包含 (source_lat, source_long, destination_lat, destination_long) 的元组

Ele is a tuple that holds (source_lat, source_long, destination_lat, destination_long)

for ele in coordinates:
#Inserts new row to table
      try:
         cursor.execute('INSERT INTO gmaps (source_latitude, source_longitude, destination_latitude, destination_longitude) VALUES (%s, %s, %s, %s)', (ele[0], ele[1], ele[2], ele[3])))
      except psycopg2.IntegrityError:
         conn.rollback()
      else:
         conn.commit()

推荐答案

有多种选项可以加快批量数据的插入速度.

There a multiple options to speed up inserting bulk data.

1.) commit() 循环结束后:

1.) commit() after the loop is finished:

for ele in coordinates:
    cursor.execute('INSERT INTO gmaps (source_latitude, source_longitude, destination_latitude, destination_longitude) VALUES (%s, %s, %s, %s)', (ele[0], ele[1], ele[2], ele[3])))
conn.commit()

2.) 使用 psycopg2 的快速执行助手，像 execute_batch() 或 execute_values().

2.) Use psycopg2's fast execution helpers, like execute_batch() or execute_values().

3.) 使用 mogrify() 的字符串集中:

3.) String concentraction using mogrify():

dataText = ','.join(cur.mogrify('(%s,%s,%s,%s)', row) for ele in coordinates)
cur.execute('INSERT INTO gmaps VALUES ' + dataText)
cur.commit()

有关 INSERT 执行速度的详细比较，请查看 this 基准.

For a detailed comparison of INSERT execution speeds have a look at this benchmark.

这篇关于为什么 psycopg2 INSERT 在循环中运行需要这么长时间，我该如何加快速度?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

为什么 psycopg2 INSERT 在循环中运行需要这么长时间，我该如何加快速度? [英] Why is psycopg2 INSERT taking so long to run in a loop and how do I speed it up?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

为什么 psycopg2 INSERT 在循环中运行需要这么长时间，我该如何加快速度? [英] Why is psycopg2 INSERT taking so long to run in a loop and how do I speed it up?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭