python pandas to_sql与sqlalchemy:如何加快导出到MS SQL的速度? [英] python pandas to_sql with sqlalchemy : how to speed up exporting to MS SQL?

查看:486
本文介绍了python pandas to_sql与sqlalchemy:如何加快导出到MS SQL的速度?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个约155,000行和12列的数据框. 如果我使用dataframe.to_csv将其导出到csv,则输出为11MB的文件(立即生成).

I have a dataframe with ca 155,000 rows and 12 columns. If I export it to csv with dataframe.to_csv , the output is an 11MB file (which is produced instantly).

但是,如果我使用to_sql方法导出到Microsoft SQL Server,则需要5到6分钟! 没有列是文本:只有int,float,bool和date.我曾见过ODBC驱动程序设置nvarchar(max)的情况,这减慢了数据传输的速度,但是在此情况并非如此.

If, however, I export to a Microsoft SQL Server with the to_sql method, it takes between 5 and 6 minutes! No columns are text: only int, float, bool and dates. I have seen cases where ODBC drivers set nvarchar(max) and this slows down the data transfer, but it cannot be the case here.

关于如何加快出口流程的任何建议?花6分钟时间导出11 MB数据使ODBC连接几乎无法使用.

Any suggestions on how to speed up the export process? Taking 6 minutes to export 11 MBs of data makes the ODBC connection practically unusable.

谢谢!

我的代码是:

import pandas as pd
from sqlalchemy import create_engine, MetaData, Table, select
ServerName = "myserver"
Database = "mydatabase"
TableName = "mytable"

engine = create_engine('mssql+pyodbc://' + ServerName + '/' + Database)
conn = engine.connect()

metadata = MetaData(conn)

my_data_frame.to_sql(TableName,engine)

推荐答案

我最近遇到了同样的问题,并希望为其他人添加答案. to_sql似乎为每一行发送一个INSERT查询,这使它真的很慢.但是,由于0.24.0pandas.to_sql()中有一个method参数,您可以在其中定义自己的插入函数,也可以仅使用method='multi'告诉熊猫在单个INSERT查询中传递多行,这使其速度大大提高.

I recently had the same problem and feel like to add an answer to this for others. to_sql seems to send an INSERT query for every row which makes it really slow. But since 0.24.0 there is a method parameter in pandas.to_sql() where you can define your own insertion function or just use method='multi' to tell pandas to pass multiple rows in a single INSERT query, which makes it a lot faster.

请注意,您的数据库可能有参数限制.在这种情况下,您还必须定义一个块大小.

Note that your Database may has a parameter limit. In that case you also have to define a chunksize.

所以解决方案应该看起来像这样:

So the solution should simply look like to this:

my_data_frame.to_sql(TableName, engine, chunksize=<yourParameterLimit>, method='multi')

如果您不知道数据库参数限制,只需尝试不使用chunksize参数即可.它会运行或提示您限制的错误.

If you do not know your database parameter limit, just try it without the chunksize parameter. It will run or give you an error telling you your limit.

这篇关于python pandas to_sql与sqlalchemy:如何加快导出到MS SQL的速度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆