使用 pyODBC 的 fast_executemany 加速 pandas.DataFrame.to_sql [英] Speeding up pandas.DataFrame.to_sql with fast_executemany of pyODBC

查看:39
本文介绍了使用 pyODBC 的 fast_executemany 加速 pandas.DataFrame.to_sql的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将一个大的 pandas.DataFrame 发送到运行 MS SQL 的远程服务器.我现在这样做的方法是将 data_frame 对象转换为元组列表,然后使用 pyODBC 的 executemany() 函数将其发送出去.它是这样的:

I would like to send a large pandas.DataFrame to a remote server running MS SQL. The way I do it now is by converting a data_frame object to a list of tuples and then send it away with pyODBC's executemany() function. It goes something like this:

 import pyodbc as pdb

 list_of_tuples = convert_df(data_frame)

 connection = pdb.connect(cnxn_str)

 cursor = connection.cursor()
 cursor.fast_executemany = True
 cursor.executemany(sql_statement, list_of_tuples)
 connection.commit()

 cursor.close()
 connection.close()

然后我开始怀疑是否可以通过使用 data_frame.to_sql() 方法来加快速度(或至少更具可读性).我想出了以下解决方案:

I then started to wonder if things can be sped up (or at least more readable) by using data_frame.to_sql() method. I have came up with the following solution:

 import sqlalchemy as sa

 engine = sa.create_engine("mssql+pyodbc:///?odbc_connect=%s" % cnxn_str)
 data_frame.to_sql(table_name, engine, index=False)

现在代码更具可读性,但上传至少慢了 150 倍...

Now the code is more readable, but the upload is at least 150 times slower...

有没有办法在使用 SQLAlchemy 时翻转 fast_executemany?

Is there a way to flip the fast_executemany when using SQLAlchemy?

我使用的是 pandas-0.20.3、pyODBC-4.0.21 和 sqlalchemy-1.1.13.

I am using pandas-0.20.3, pyODBC-4.0.21 and sqlalchemy-1.1.13.

推荐答案

在联系了SQLAlchemy的开发者之后,出现了解决这个问题的方法.非常感谢他们的出色工作!

After contacting the developers of SQLAlchemy, a way to solve this problem has emerged. Many thanks to them for the great work!

必须使用游标执行事件并检查是否已引发 executemany 标志.如果确实如此,请打开 fast_executemany 选项.例如:

One has to use a cursor execution event and check if the executemany flag has been raised. If that is indeed the case, switch the fast_executemany option on. For example:

from sqlalchemy import event

@event.listens_for(engine, 'before_cursor_execute')
def receive_before_cursor_execute(conn, cursor, statement, params, context, executemany):
    if executemany:
        cursor.fast_executemany = True

关于执行事件的更多信息可以在此处.

More information on execution events can be found here.

更新:在SQLAlchemy 1.3.0,因此不再需要此 hack.

UPDATE: Support for fast_executemany of pyodbc was added in SQLAlchemy 1.3.0, so this hack is not longer necessary.

这篇关于使用 pyODBC 的 fast_executemany 加速 pandas.DataFrame.to_sql的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆