使用 pyODBC 的 fast_executemany 加速 pandas.DataFrame.to_sql [英] Speeding up pandas.DataFrame.to_sql with fast_executemany of pyODBC
问题描述
我想将一个大的 pandas.DataFrame
发送到运行 MS SQL 的远程服务器.我现在这样做的方法是将 data_frame
对象转换为元组列表,然后使用 pyODBC 的 executemany()
函数将其发送出去.它是这样的:
I would like to send a large pandas.DataFrame
to a remote server running MS SQL. The way I do it now is by converting a data_frame
object to a list of tuples and then send it away with pyODBC's executemany()
function. It goes something like this:
import pyodbc as pdb
list_of_tuples = convert_df(data_frame)
connection = pdb.connect(cnxn_str)
cursor = connection.cursor()
cursor.fast_executemany = True
cursor.executemany(sql_statement, list_of_tuples)
connection.commit()
cursor.close()
connection.close()
然后我开始怀疑是否可以通过使用 data_frame.to_sql()
方法来加快速度(或至少更具可读性).我想出了以下解决方案:
I then started to wonder if things can be sped up (or at least more readable) by using data_frame.to_sql()
method. I have came up with the following solution:
import sqlalchemy as sa
engine = sa.create_engine("mssql+pyodbc:///?odbc_connect=%s" % cnxn_str)
data_frame.to_sql(table_name, engine, index=False)
现在代码更具可读性,但上传至少慢了 150 倍...
Now the code is more readable, but the upload is at least 150 times slower...
有没有办法在使用 SQLAlchemy 时翻转 fast_executemany
?
Is there a way to flip the fast_executemany
when using SQLAlchemy?
我使用的是 pandas-0.20.3、pyODBC-4.0.21 和 sqlalchemy-1.1.13.
I am using pandas-0.20.3, pyODBC-4.0.21 and sqlalchemy-1.1.13.
推荐答案
在联系了SQLAlchemy的开发者之后,出现了解决这个问题的方法.非常感谢他们的出色工作!
After contacting the developers of SQLAlchemy, a way to solve this problem has emerged. Many thanks to them for the great work!
必须使用游标执行事件并检查是否已引发 executemany
标志.如果确实如此,请打开 fast_executemany
选项.例如:
One has to use a cursor execution event and check if the executemany
flag has been raised. If that is indeed the case, switch the fast_executemany
option on. For example:
from sqlalchemy import event
@event.listens_for(engine, 'before_cursor_execute')
def receive_before_cursor_execute(conn, cursor, statement, params, context, executemany):
if executemany:
cursor.fast_executemany = True
关于执行事件的更多信息可以在此处.
More information on execution events can be found here.
更新:在SQLAlchemy 1.3.0,因此不再需要此 hack.
UPDATE: Support for fast_executemany
of pyodbc
was added in SQLAlchemy 1.3.0, so this hack is not longer necessary.
这篇关于使用 pyODBC 的 fast_executemany 加速 pandas.DataFrame.to_sql的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!