sqlalchemy批量插入比构建原始SQL慢 [英] sqlalchemy bulk insert is slower than building raw SQL

查看:333
本文介绍了sqlalchemy批量插入比构建原始SQL慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在通过sqlalchemy批量阅读这篇文章插入效果。我尝试了基准测试中指定的各种方法- SQLAlchemy ORM bulk_insert_mappings() SQLAlchemy Core 。不幸的是,要插入1000行,所有这些方法都需要大约1分钟的时间来插入它们。这太慢了。我还尝试在此处中指定的方法-这需要我构建一个大型SQL语句,例如:

I'm going through this article on the sqlalchemy bulk insert performance. I tried various approaches specified in the benchmark test - SQLAlchemy ORM bulk_insert_mappings(), SQLAlchemy Core. Unfortunately for inserting 1000 rows all these methods required about 1min to insert them. This is horrendously slow. I tried also the approach specified here - this requires me building a large SQL statement like:

INSERT INTO mytable (col1, col2, col3)
VALUES (1,2,3), (4,5,6) ..... --- up to 1000 of these

中的1000条,此原始SQL的插入内容如下:

And the insert for this raw SQL is something like:

MySession.execute('''
insert into MyTable (e, l, a)
values {}
'''.format(",".join(my_insert_str)))

使用此方法,我改进了在10到11秒内可以执行50倍以上的操作,达到10000次插入。

Using this approach I improved the performance 50x+ times to 10000 insertions in 10-11 seconds.

以下是使用内置库的方法代码。

Here is the code for the approach using the build-in lib.

class MyClass(Base):
    __tablename__ = "MyTable"
    e = Column(String(256), primary_key=True)
    l = Column(String(6))
    a = Column(String(20), primary_key=True)

    def __repr__(self):
        return self.e + " " + self.a+ " " + self.l

.......

        dict_list = []
        for i, row in chunk.iterrows():

            dict_list += [{"e" : row["e"], "l" : l, "a" : a}]

        MySession.execute(
            Myclass.__table__.insert(),
            dict_list
        )

这是我连接数据库的方式。

Here is how I connect to the database.

    params = urllib.quote_plus("DRIVER={SQL Server Native Client 10.0};SERVER=servername;DATABASE=dbname;UID=user;PWD=pass")
    engine = create_engine("mssql+pyodbc:///?odbc_connect=%s" % params )
    MySession.configure(bind=engine, autoflush=False, expire_on_commit=False)

是否存在我的设置有很多问题,会降低性能吗?我尝试使用其他数据库驱动程序-pyodbc和pymssql。我尝试过的所有操作都无法接近他们在文章中声称的数字:

Is there an issue with my set up to degrade the performance so much? I tried with different db drivers - pyodbc and pymssql. What ever I try I cannot any close to the numbers they claim in the article namely:

SQLAlchemy ORM: Total time for 100000 records 2.192882061 secs
SQLAlchemy ORM pk given: Total time for 100000 records 1.41679310799 secs
SQLAlchemy ORM bulk_save_objects(): Total time for 100000 records 0.494568824768 secs
SQLAlchemy ORM bulk_insert_mappings(): Total time for 100000 records 0.325763940811 secs
SQLAlchemy Core: Total time for 100000 records 0.239127874374 secs
sqlite3: Total time for 100000 records 0.124729156494 sec

我正在连接MS SQL Server2008。如果我错过了任何其他详细信息,请通知我。

I'm connecting to MS SQL Server 2008. Let me know if I've missed any other details.

原始SQL方法的问题在于它不是SQL注入安全的。因此,或者,如果您对如何解决此问题有建议,也将非常有帮助:)。

The problem with the raw SQL approach is that it's not SQL injection safe. So alternatively if you have suggestions how to solve this issue it will be also very helpful :).

推荐答案

您正在做

MySession.execute(
    Myclass.__table__.insert(),
    dict_list
)

使用 executemany()。它与 INSERT INTO ... VALUES ... 不同。要使用 VALUES ,请执行以下操作:

which uses executemany(). It is not the same as INSERT INTO ... VALUES .... To use VALUES, do:

MySession.execute(
    Myclass.__table__.insert().values(dict_list)
)

请注意,SQL注入问题使用参数解决:

As a side note, the SQL injection problem is solved using parameters:

MySession.execute('''
insert into MyTable (e, l, a)
values (?, ?, ?), (?, ?, ?), ...
''', params)

这里的要点是,您没有比较等效的构造。您没有在SQLAlchemy生成的查询中使用 VALUES ,但是您在文本SQL中,并且没有在文本SQL中使用参数化,但是在SQLAlchemy中生成的查询。如果您为执行的SQL语句打开日志记录,则会发现完全不同。

The takeaway here is that you're not comparing equivalent constructs. You're not using VALUES in the SQLAlchemy-generated query but you are in your textual SQL, and you're not using parameterization in your textual SQL but you are in the SQLAlchemy-generated query. If you turn on logging for the executed SQL statements you'll see exactly what is different.

这篇关于sqlalchemy批量插入比构建原始SQL慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆