使用Pandas .to_sql将JSON列写入Postgres [英] Writing JSON column to Postgres using Pandas .to_sql

查看:162
本文介绍了使用Pandas .to_sql将JSON列写入Postgres的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

ETL 过程中,我需要从中提取并加载JSON列一个Postgres数据库到另一个.我们之所以使用Pandas,是因为它具有多种方法来读取和写入来自不同源/目的地的数据,并且所有转换都可以使用Python和Pandas编写.我们对诚实的态度感到非常满意..但是我们遇到了问题.

During an ETL process I needed to extract and load a JSON column from one Postgres database to another. We use Pandas for this since it has so many ways to read and write data from different sources/destinations and all the transformations can be written using Python and Pandas. We're quite happy with the approach to be honest.. but we hit a problem.

通常,读取和写入数据非常容易.您只需使用 pandas.read_sql_table 来读取数据源代码和 pandas.to_sql 写入到目的地.但是,由于源表之一具有JSON类型的列(来自Postgres),因此to_sql函数崩溃并显示以下错误消息.

Usually it's quite easy to read and write the data. You just use pandas.read_sql_table to read the data from the source and pandas.to_sql to write it to the destination. But, since one of the source tables had a column of type JSON (from Postgres) the to_sql function crashed with the following error message.

    df.to_sql(table_name, analytics_db)
  File "/home/ec2-user/python-virtual-environments/etl/local/lib64/python2.7/site-packages/pandas/core/generic.py", line 1201, in to_sql
    chunksize=chunksize, dtype=dtype)
  File "/home/ec2-user/python-virtual-environments/etl/local/lib64/python2.7/site-packages/pandas/io/sql.py", line 470, in to_sql
    chunksize=chunksize, dtype=dtype)
  File "/home/ec2-user/python-virtual-environments/etl/local/lib64/python2.7/site-packages/pandas/io/sql.py", line 1147, in to_sql
    table.insert(chunksize)
  File "/home/ec2-user/python-virtual-environments/etl/local/lib64/python2.7/site-packages/pandas/io/sql.py", line 663, in insert
    self._execute_insert(conn, keys, chunk_iter)
  File "/home/ec2-user/python-virtual-environments/etl/local/lib64/python2.7/site-packages/pandas/io/sql.py", line 638, in _execute_insert
    conn.execute(self.insert_statement(), data)
  File "/home/ec2-user/python-virtual-environments/etl/local/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 945, in execute
    return meth(self, multiparams, params)
  File "/home/ec2-user/python-virtual-environments/etl/local/lib64/python2.7/site-packages/sqlalchemy/sql/elements.py", line 263, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/home/ec2-user/python-virtual-environments/etl/local/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1053, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/home/ec2-user/python-virtual-environments/etl/local/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1189, in _execute_context
    context)
  File "/home/ec2-user/python-virtual-environments/etl/local/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1393, in _handle_dbapi_exception
    exc_info
  File "/home/ec2-user/python-virtual-environments/etl/local/lib64/python2.7/site-packages/sqlalchemy/util/compat.py", line 202, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File "/home/ec2-user/python-virtual-environments/etl/local/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1159, in _execute_context
    context)
  File "/home/ec2-user/python-virtual-environments/etl/local/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 459, in do_executemany
    cursor.executemany(statement, parameters)
sqlalchemy.exc.ProgrammingError: (psycopg2.ProgrammingError) can't adapt type 'dict'

推荐答案

我一直在网上寻找解决方案,但找不到任何解决方案,所以这就是我们想出的办法(也许有更好的方法,但至少这是一个开始,如果有人遇到这个问题.

I've been searching the web for a solution but couldn't find any so here is what we came up with (there might be better ways but at least this is a start if someone else runs into this).

to_sql中指定dtype参数.

我们从:df.to_sql(table_name, analytics_db)转到了df.to_sql(table_name, analytics_db, dtype={'name_of_json_column_in_source_table': sqlalchemy.types.JSON}),它就可以正常工作.

We went from:df.to_sql(table_name, analytics_db) to df.to_sql(table_name, analytics_db, dtype={'name_of_json_column_in_source_table': sqlalchemy.types.JSON}) and it just works.

这篇关于使用Pandas .to_sql将JSON列写入Postgres的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆