为什么在小型df上的fast_executemany会出现内存错误? [英] Why would I get a memory error with fast_executemany on a tiny df?

查看:75
本文介绍了为什么在小型df上的fast_executemany会出现内存错误?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在寻找加快将数据帧推送到sql服务器的方法,偶然发现了一种方法在此.从速度上让我震惊.使用普通的to_sql花了将近2个小时,此脚本在12.54秒内完成,以推动100k行X 100列df.

I was looking for ways to speed up pushing a dataframe to sql server and stumbled upon an approach here. This approach blew me away in terms of speed. Using normal to_sql took almost 2 hours and this script was done in 12.54 seconds to push a 100k row X 100 column df.

因此,在使用示例df测试了以下代码之后,我尝试使用具有许多不同数据类型(int,string,floats,Booleans)的df.不过,我很伤心看到一个内存错误.因此,我开始缩小df的大小,以查看限制是什么.我注意到,如果我的df有任何字符串,那么我将无法加载到sql server.我无法进一步隔离问题.下面的脚本来自链接中的问题,但是,我添加了一个带有字符串的小df.关于如何纠正此问题的任何建议都很棒!

So after testing the code below with a sample df, I attempted to use a df that had many different datatypes (int, string, floats, Booleans). However, I was sad to see a memory error. So I started reducing the size of my df to to see what the limitations were. I noticed that if my df had any strings then I wasn't able to load to sql server. I am having trouble isolating the issue further. The script below is taken from the question in the link, however, I added a tiny df with strings. Any suggestions on how to rectify this issue would be great!

import pandas as pd
import numpy as np
import time
from sqlalchemy import create_engine, event
from urllib.parse import quote_plus
import pyodbc

conn =  "DRIVER={SQL Server};SERVER=SERVER_IP;DATABASE=DB_NAME;UID=USER_ID;PWD=PWD"
quoted = quote_plus(conn)
new_con = 'mssql+pyodbc:///?odbc_connect={}'.format(quoted)
engine = create_engine(new_con)


@event.listens_for(engine, 'before_cursor_execute')
def receive_before_cursor_execute(conn, cursor, statement, params, context, executemany):
    print("FUNC call")
    if executemany:
        cursor.fast_executemany = True


table_name = 'fast_executemany_test'
df1 = pd.DataFrame({'col1':['tyrefdg','ertyreg','efdgfdg'],
                   'col2':['tydfggfdgrefdg','erdfgfdgfdgfdgtyreg','edfgfdgdfgdffdgfdg']
                   })



s = time.time()
df1.to_sql(table_name, engine, if_exists = 'replace', chunksize = None)
print(time.time() - s)

推荐答案

我能够使用pyodbc 4.0.23重现您的问题. MemoryError与您对古代

I was able to reproduce your issue using pyodbc 4.0.23. The MemoryError was related to your use of the ancient

DRIVER={SQL Server}

使用

DRIVER=ODBC Driver 11 for SQL Server

也失败了,

函数序列错误(0)(SQLParamData)

Function sequence error (0) (SQLParamData)

与GitHub上现有的pyodbc问题有关.我在此处发布了我的发现.

which was related to an existing pyodbc issue on GitHub. I posted my findings here.

该问题仍在调查中.在此期间,您也许可以继续进行操作

That issue is still under investigation. In the meantime you might be able to proceed by

  • 使用更新的ODBC驱动程序,例如DRIVER=ODBC Driver 13 for SQL Server
  • 运行pip install pyodbc==4.0.22以使用较早版本的pyodbc.
  • using a newer ODBC driver like DRIVER=ODBC Driver 13 for SQL Server, and
  • running pip install pyodbc==4.0.22 to use an earlier version of pyodbc.

这篇关于为什么在小型df上的fast_executemany会出现内存错误?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆