pandas to_sql“追加"到现有表会导致Python崩溃 [英] Pandas to_sql 'append' to an existing table causes Python crash

查看:167
本文介绍了 pandas to_sql“追加"到现有表会导致Python崩溃的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的问题本质上是这样的:当我尝试将to_sql与if_exists ='append'一起使用,并且将名称设置为SQL Server上已经存在python崩溃的表时.

My problem is essentially this: When I try to use to_sql with if_exists = 'append' and name is set to a table on my SQL Server that already exists python crashes.

这是我的代码:

@event.listens_for(engine, 'before_cursor_execute') def receive_before_cursor_execute(conn, cursor, statement, params, context, executemany):
    if executemany:
        cursor.fast_executemany = True

df.to_sql(name = 'existingSQLTable', con = engine, if_exists = 'append', index = False, chunksize = 10000, dtype = dataTypes)

我没有包含它,但是dataTypes是所有列名及其数据类型的字典.

I didn't include it but dataTypes is a dictionary of all the column names and their data type.

这是我得到的错误:

    Traceback (most recent call last):
      File "C:\Apps\Anaconda3\lib\site-packages\sqlalchemy\engine\base.py", line 1116, in _execute_context
        context)
      File "C:\Apps\Anaconda3\lib\site-packages\sqlalchemy\engine\default.py", line 447, in do_executemany
        cursor.executemany(statement, parameters)
    pyodbc.IntegrityError: ('23000', "[23000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Violation of PRIMARY KEY constraint 'PK__existingSQLTable__'. Cannot insert duplicate key in object 'dbo.existingSQLTable'. The duplicate key value is (20008.7, 2008-08-07, Fl). (2627) (SQLExecute); [23000] [Microsoft][SQL Server Native Client 11.0][SQL Server]The statement has been terminated. (3621)")

    The above exception was the direct cause of the following exception:

    Traceback (most recent call last):
      File "<pyshell#24>", line 1, in <module>
        Table.to_sql(name = 'existingSQLTable', con = engine, if_exists = 'append', index = False, chunksize = 10000, dtype = dataTypes)
      File "C:\Apps\Anaconda3\lib\site-packages\pandas\core\generic.py", line 1165, in to_sql
        chunksize=chunksize, dtype=dtype)
      File "C:\Apps\Anaconda3\lib\site-packages\pandas\io\sql.py", line 571, in to_sql
        chunksize=chunksize, dtype=dtype)
      File "C:\Apps\Anaconda3\lib\site-packages\pandas\io\sql.py", line 1250, in to_sql
        table.insert(chunksize)
      File "C:\Apps\Anaconda3\lib\site-packages\pandas\io\sql.py", line 770, in insert
        self._execute_insert(conn, keys, chunk_iter)
      File "C:\Apps\Anaconda3\lib\site-packages\pandas\io\sql.py", line 745, in _execute_insert
        conn.execute(self.insert_statement(), data)
      File "C:\Apps\Anaconda3\lib\site-packages\sqlalchemy\engine\base.py", line 914, in execute
        return meth(self, multiparams, params)
      File "C:\Apps\Anaconda3\lib\site-packages\sqlalchemy\sql\elements.py", line 323, in _execute_on_connection
        return connection._execute_clauseelement(self, multiparams, params)
      File "C:\Apps\Anaconda3\lib\site-packages\sqlalchemy\engine\base.py", line 1010, in _execute_clauseelement
        compiled_sql, distilled_params
      File "C:\Apps\Anaconda3\lib\site-packages\sqlalchemy\engine\base.py", line 1146, in _execute_context
        context)
      File "C:\Apps\Anaconda3\lib\site-packages\sqlalchemy\engine\base.py", line 1341, in _handle_dbapi_exception
        exc_info
      File "C:\Apps\Anaconda3\lib\site-packages\sqlalchemy\util\compat.py", line 202, in raise_from_cause
        reraise(type(exception), exception, tb=exc_tb, cause=cause)
      File "C:\Apps\Anaconda3\lib\site-packages\sqlalchemy\util\compat.py", line 185, in reraise
        raise value.with_traceback(tb)
      File "C:\Apps\Anaconda3\lib\site-packages\sqlalchemy\engine\base.py", line 1116, in _execute_context
        context)
      File "C:\Apps\Anaconda3\lib\site-packages\sqlalchemy\engine\default.py", line 447, in do_executemany
        cursor.executemany(statement, parameters)

基于这些错误,对我来说,fast_executemany标志似乎有问题,但是我已经阅读了很多文档,并没有发现任何问题.

Based on the errors, to me it appears that there's something wrong with the flag fast_executemany, but I've read a lot of documentation on it, and don't see anything wrong with it.

可能需要注意的事情:

  1. if_exists ='replace'尚不存在的表可以正常工作
  2. if_exists ='append'尚不存在的表可以正常工作
  3. if_exists ='replace'已经存在的表可以正常工作
  4. 我的DataFrame大约有300万行和25列(主要是浮点数和一些短字符串)
  5. 我可以成功地写入900,000行的绝对最大值,而不会导致python崩溃.
  6. 我正在使用SQL Server,pandas 0.23.3,pyodbc 4.0.23(我在4.0.22中也遇到了相同的错误),Jupyter Notebook(我也在IDLE中尝试了相同的结果), Windows 10,Python 3.5.1和Anaconda 3.

对我来说,显而易见的解决方案是将DataFrame分成900,000行的块.当第一个块成功上传后,如果没有python崩溃,我什至不能在其中添加一行.

The obvious solution to me was to break the DataFrame up into chunks of 900,000 rows. While the first chunk is successfully uploaded, I cannot append even a single row to it without python crashing.

此错误是否是旨在加快处理速度的代码的结果(它的执行效果非常好)?我是否误解了to_sql函数?还是还有其他事情发生?任何建议将是巨大的!另外,如果有人遇到类似的问题,那真是太好了!

Is this error a result of the code meant to speed up the process (which it does fantastically)? Am I misunderstanding the to_sql function? Or is there something else going on? Any suggestions would be great! Also, if anyone has a similar problem it would be great to know!

推荐答案

正如@Jon Clements解释的那样,问题在于存在具有相同主键的行(但这些行本身并不相同).我使用了熊猫df.drop_duplicates函数,并将subset参数设置为主键列.这解决了违反PK错误的问题.

As @Jon Clements explained, the problem was that there were rows which had identical primary keys (but the rows weren't themselves identical). I used the pandas df.drop_duplicates function, with the subset parameter set to the primary key columns. This solved the Violation of PK error.

这篇关于 pandas to_sql“追加"到现有表会导致Python崩溃的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆