python pandas to_sql with sqlalchemy:如何加快导出到MS SQL的速度? [英] python pandas to_sql with sqlalchemy : how to speed up exporting to MS SQL?

查看:34
本文介绍了python pandas to_sql with sqlalchemy:如何加快导出到MS SQL的速度?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大约有 155,000 行和 12 列的数据框.如果我使用 dataframe.to_csv 将其导出到 csv,则输出是一个 11MB 的文件(立即生成).

I have a dataframe with ca 155,000 rows and 12 columns. If I export it to csv with dataframe.to_csv , the output is an 11MB file (which is produced instantly).

但是,如果我使用 to_sql 方法导出到 Microsoft SQL Server,则需要 5 到 6 分钟!没有列是文本:只有 int、float、bool 和日期.我见过 ODBC 驱动程序设置 nvarchar(max) 并且这会减慢数据传输速度的情况,但这里不是这种情况.

If, however, I export to a Microsoft SQL Server with the to_sql method, it takes between 5 and 6 minutes! No columns are text: only int, float, bool and dates. I have seen cases where ODBC drivers set nvarchar(max) and this slows down the data transfer, but it cannot be the case here.

有关如何加快导出过程的任何建议?导出 11 MB 的数据需要 6 分钟,这使得 ODBC 连接几乎无法使用.

Any suggestions on how to speed up the export process? Taking 6 minutes to export 11 MBs of data makes the ODBC connection practically unusable.

谢谢!

我的代码是:

import pandas as pd
from sqlalchemy import create_engine, MetaData, Table, select
ServerName = "myserver"
Database = "mydatabase"
TableName = "mytable"

engine = create_engine('mssql+pyodbc://' + ServerName + '/' + Database)
conn = engine.connect()

metadata = MetaData(conn)

my_data_frame.to_sql(TableName,engine)

推荐答案

我最近遇到了同样的问题,想为其他人添加一个答案.to_sql 似乎为每一行发送一个 INSERT 查询,这使得它真的很慢.但是由于 0.24.0pandas.to_sql() 中有一个 method 参数,您可以在其中定义自己的插入函数或仅使用 method='multi' 告诉 Pandas 在单个 INSERT 查询中传递多行,这会使其速度更快.

I recently had the same problem and feel like to add an answer to this for others. to_sql seems to send an INSERT query for every row which makes it really slow. But since 0.24.0 there is a method parameter in pandas.to_sql() where you can define your own insertion function or just use method='multi' to tell pandas to pass multiple rows in a single INSERT query, which makes it a lot faster.

请注意,您的数据库可能有参数限制.在这种情况下,您还必须定义块大小.

Note that your Database may has a parameter limit. In that case you also have to define a chunksize.

所以解决方案应该看起来像这样:

So the solution should simply look like to this:

my_data_frame.to_sql(TableName, engine, chunksize=<yourParameterLimit>, method='multi')

如果您不知道您的数据库参数限制,请在没有 chunksize 参数的情况下尝试.它会运行或给你一个错误告诉你你的限制.

If you do not know your database parameter limit, just try it without the chunksize parameter. It will run or give you an error telling you your limit.

这篇关于python pandas to_sql with sqlalchemy:如何加快导出到MS SQL的速度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆