如何在 sqlalchemy 中使用 psycopg2.extras? [英] How can I use psycopg2.extras in sqlalchemy?

查看:58
本文介绍了如何在 sqlalchemy 中使用 psycopg2.extras?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将大量条目(~600k)上传到 PostgreSQL 数据库中的一个简单表中,每个条目有一个外键、一个时间戳和 3 个浮点数.但是,每个条目需要 60 毫秒来执行描述的核心批量插入 这里,因此整个执行需要 10 小时.我发现这是 executemany() 方法的性能问题,但是它已经通过 psycopg2 2.7.

I want to upload a huge number of entries (~600k) into a simple table in a PostgreSQL DB, with one foreign key, a timestamp and 3 float per each entry. However, it takes 60 ms per each entry to execute the core bulk insert described here, thus the whole execution would take 10 h. I have found out, that it is a performance issue of executemany() method, however it has been solved with the execute_values() method in psycopg2 2.7.

我运行的代码如下:

#build a huge list of dicts, one dict for each entry
engine.execute(SimpleTable.__table__.insert(),
               values) # around 600k dicts in a list

我看到这是一个常见问题,但是我还没有设法在 sqlalchemy 中找到解决方案.有什么方法可以告诉 sqlalchemy 在某些情况下调用 execute_values() 吗?有没有其他方法可以不用自己构造SQL语句来实现大插入?

I see that it is a common problem, however I have not managed to find a solution in sqlalchemy itself. Is there any way to tell sqlalchemy to call execute_values() in some occasions? Is there any other way to implement huge inserts without constructing the SQL statements by myself?

感谢您的帮助!

推荐答案

这不是您正在寻找的答案,因为这不能解决试图指示 SQLAlchemy 使用 psycopg extras 的问题,并且需要 – 某种 – 手动 SQL,但是:您可以使用 raw_connection(),允许使用 复制自:

Not the answer you are looking for in the sense that this does not address attempting to instruct SQLAlchemy to use the psycopg extras, and requires – sort of – manual SQL, but: you can access the underlying psycopg connections from an engine with raw_connection(), which allows using COPY FROM:

import io
import csv
from psycopg2 import sql

def bulk_copy(engine, table, values):
    csv_file = io.StringIO()
    headers = list(values[0].keys())
    writer = csv.DictWriter(csv_file, headers)
    writer.writerows(values)

    csv_file.seek(0)

    # NOTE: `format()` here is *not* `str.format()`, but
    # `SQL.format()`. Never use plain string formatting.
    copy_stmt = sql.SQL("COPY {} (" +
                        ",".join(["{}"] * len(headers)) +
                        ") FROM STDIN CSV").\
        format(sql.Identifier(str(table.name)),
               *(sql.Identifier(col) for col in headers))

    # Fetch a raw psycopg connection from the SQLAlchemy engine
    conn = engine.raw_connection()
    try:
        with conn.cursor() as cur:
            cur.copy_expert(copy_stmt, csv_file)

        conn.commit()

    except:
        conn.rollback()
        raise

    finally:
        conn.close()

然后

bulk_copy(engine, SimpleTable.__table__, values)

与执行 INSERT 语句相比,这应该足够快了.在这台机器上移动 600,000 条记录需要大约 8 秒,大约 13 微秒/记录.您还可以将原始连接和游标与 extras 包一起使用.

This should be plenty fast compared to executing INSERT statements. Moving 600,000 records on this machine took around 8 seconds, ~13µs/record. You could also use the raw connections and cursor with the extras package.

这篇关于如何在 sqlalchemy 中使用 psycopg2.extras?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆