SQLAlchemy Core批量插入速度慢 [英] SQLAlchemy Core bulk insert slow

查看:100
本文介绍了SQLAlchemy Core批量插入速度慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试截断一个表,并使用SQLAlchemy仅插入约3000行数据,而且速度非常慢(约10分钟).

I'm trying to truncate a table and insert only ~3000 rows of data using SQLAlchemy, and it's very slow (~10 minutes).

我遵循了此 doc 上的建议,并利用sqlalchemy核心做我的插入,但是它仍然运行非常非常慢.对我来说,可能是什么罪魁祸首?数据库是一个postgres RDS实例.谢谢!

I followed the recommendations on this doc and leveraged sqlalchemy core to do my inserts, but it's still running very very slow. What are possible culprits for me to look at? Database is a postgres RDS instance. Thanks!

engine = sa.create_engine(db_string, **kwargs, pool_recycle=3600)
with engine.begin() as conn:
            conn.execute("TRUNCATE my_table")
            conn.execute(
                MyTable.__table__.insert(),
                data #where data is a list of dicts
            )

推荐答案

当我看到这没有答案时,我感到非常沮丧……前几天,我遇到了完全相同的问题:尝试批量插入使用CORE向Postgres RDS实例发送数百万行.这花了小时.

I was bummed when I saw this didn't have an answer... I ran into the exact same problem the other day: Trying to bulk-insert about millions of rows to a Postgres RDS Instance using CORE. It was taking hours.

作为一种解决方法,我最终编写了自己的散装插入脚本,该脚本生成了原始sql本身:

As a workaround, I ended up writing my own bulk-insert script that generated the raw sql itself:

bulk_insert_str = []
for entry in entry_list:
    val_str = "('{}', '{}', ...)".format(entry["column1"], entry["column2"], ...)
    bulk_insert_str.append(val_str)

engine.execute(
    """
    INSERT INTO my_table (column1, column2 ...)
    VALUES {}
    """.format(",".join(bulk_insert_str))
)

虽然很丑,但它却为我提供了所需的性能(〜500,000行/分钟)

While ugly, this gave me the performance we needed (~500,000 rows/minute)

您找到基于CORE的解决方案了吗?如果没有,希望对您有帮助!

Did you find a CORE-based solution? If not, hope this helps!

更新:最终将我的旧脚本移到了一个我们未使用的备用EC2实例中,该实例实际上解决了性能缓慢的问题.不确定您的设置是什么,但是从外部(非AWS)连接与RDS进行通信显然会产生网络开销.

UPDATE: Ended up moving my old script into a spare EC2 instance that we weren't using which actually fixed the slow performance issue. Not sure what your setup is, but apparently there's a network overhead in communicating with RDS from an external (non-AWS) connection.

这篇关于SQLAlchemy Core批量插入速度慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆