如何在 sqlalchemy 中使用 psycopg2.extras? [英] How can I use psycopg2.extras in sqlalchemy?
问题描述
我想将大量条目(~600k)上传到 PostgreSQL 数据库中的一个简单表中,每个条目有一个外键、一个时间戳和 3 个浮点数.但是,每个条目需要 60 毫秒来执行描述的核心批量插入 这里,因此整个执行需要 10 小时.我发现这是 executemany()
方法的性能问题,但是它已经通过 psycopg2 2.7.
I want to upload a huge number of entries (~600k) into a simple table in a PostgreSQL DB, with one foreign key, a timestamp and 3 float per each entry. However, it takes 60 ms per each entry to execute the core bulk insert described here, thus the whole execution would take 10 h. I have found out, that it is a performance issue of executemany()
method, however it has been solved with the execute_values()
method in psycopg2 2.7.
我运行的代码如下:
#build a huge list of dicts, one dict for each entry
engine.execute(SimpleTable.__table__.insert(),
values) # around 600k dicts in a list
我看到这是一个常见问题,但是我还没有设法在 sqlalchemy 中找到解决方案.有什么方法可以告诉 sqlalchemy 在某些情况下调用 execute_values()
吗?有没有其他方法可以不用自己构造SQL语句来实现大插入?
I see that it is a common problem, however I have not managed to find a solution in sqlalchemy itself. Is there any way to tell sqlalchemy to call execute_values()
in some occasions? Is there any other way to implement huge inserts without constructing the SQL statements by myself?
感谢您的帮助!
推荐答案
这不是您正在寻找的答案,因为这不能解决试图指示 SQLAlchemy 使用 psycopg extras 的问题,并且需要 – 某种 – 手动 SQL,但是:您可以使用 raw_connection()
,允许使用 复制自:
Not the answer you are looking for in the sense that this does not address attempting to instruct SQLAlchemy to use the psycopg extras, and requires – sort of – manual SQL, but: you can access the underlying psycopg connections from an engine with raw_connection()
, which allows using COPY FROM:
import io
import csv
from psycopg2 import sql
def bulk_copy(engine, table, values):
csv_file = io.StringIO()
headers = list(values[0].keys())
writer = csv.DictWriter(csv_file, headers)
writer.writerows(values)
csv_file.seek(0)
# NOTE: `format()` here is *not* `str.format()`, but
# `SQL.format()`. Never use plain string formatting.
copy_stmt = sql.SQL("COPY {} (" +
",".join(["{}"] * len(headers)) +
") FROM STDIN CSV").\
format(sql.Identifier(str(table.name)),
*(sql.Identifier(col) for col in headers))
# Fetch a raw psycopg connection from the SQLAlchemy engine
conn = engine.raw_connection()
try:
with conn.cursor() as cur:
cur.copy_expert(copy_stmt, csv_file)
conn.commit()
except:
conn.rollback()
raise
finally:
conn.close()
然后
bulk_copy(engine, SimpleTable.__table__, values)
与执行 INSERT 语句相比,这应该足够快了.在这台机器上移动 600,000 条记录需要大约 8 秒,大约 13 微秒/记录.您还可以将原始连接和游标与 extras 包一起使用.
This should be plenty fast compared to executing INSERT statements. Moving 600,000 records on this machine took around 8 seconds, ~13µs/record. You could also use the raw connections and cursor with the extras package.
这篇关于如何在 sqlalchemy 中使用 psycopg2.extras?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!