为什么 Pandas 和 GeoPandas 能够使用 DBAPI (psycopg2) 连接读取数据库表,但必须依靠 SQLAlchemy 来编写? [英] Why are Pandas and GeoPandas able to read a database table using a DBAPI (psycopg2) connection but have to rely on SQLAlchemy to write one?

查看:91
本文介绍了为什么 Pandas 和 GeoPandas 能够使用 DBAPI (psycopg2) 连接读取数据库表,但必须依靠 SQLAlchemy 来编写?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在尝试对来自 Python3<的某些数据库执行一些 I/O 操作时遇到了麻烦/code> 脚本.

I just get into trouble while trying to do some I/O operations on some databases from a Python3 script.

当我想连接到一个数据库时,我习惯性地使用 psycopg2 来处理 连接光标.

When I want to connect to a database, I habitually use psycopg2 in order to handle the connections and cursors.

我的数据通常存储为 Pandas DataFrames 和/或 GeoPandas 相当于 GeoDataFrames.

My data are usually stored as Pandas DataFrames and/or GeoPandas's equivalent GeoDataFrames.

我可以依赖它的.read_sql() 方法以参数 con 为参数,如文档中所述:

I can rely on its .read_sql() methods which takes as a parameter con, as stated in the doc:

con : SQLAlchemy connectable (engine/connection) or database str URI
        or DBAPI2 connection (fallback mode)'
        Using SQLAlchemy makes it possible to use any DB supported by that
        library. If a DBAPI2 object, only sqlite3 is supported. The user is responsible
        for engine disposal and connection closure for the SQLAlchemy connectable. See
        `here <https://docs.sqlalchemy.org/en/13/core/connections.html>`_

使用GeoPandas:

我可以依靠它的 .read_postigs() 作为参数的方法 con,如文档中所述:

con : DB connection object or SQLAlchemy engine
        Active connection to the database to query.

为了写入数据数据库表;

使用Pandas:

我可以依靠 .to_sql() 方法以参数 con 为参数,如文档中所述:

In order to write data to a database table;

Using Pandas:

I can rely on the .to_sql() methods which takes as a parameter con, as stated in the doc:

con : sqlalchemy.engine.Engine or sqlite3.Connection
        Using SQLAlchemy makes it possible to use any DB supported by that
        library. Legacy support is provided for sqlite3.Connection objects. The user
        is responsible for engine disposal and connection closure for the SQLAlchemy
        connectable See `here                 <https://docs.sqlalchemy.org/en/13/core/connections.html>`_

使用GeoPandas:

我可以依赖 .to_sql() 方法(直接依赖于 Pandas .to_sql()) 作为参数 con,如文档中所述:

Using GeoPandas:

I can rely on the .to_sql() methods (which directly relies on the Pandas .to_sql()) which takes as a parameter con, as stated in the doc:

con : sqlalchemy.engine.Engine or sqlite3.Connection
        Using SQLAlchemy makes it possible to use any DB supported by that
        library. Legacy support is provided for sqlite3.Connection objects. The user
        is responsible for engine disposal and connection closure for the SQLAlchemy
        connectable See `here                 <https://docs.sqlalchemy.org/en/13/core/connections.html>`_

从这里,我很容易理解 GeoPandas 是建立在 Pandas 特别是它的 GeoDataFrame 对象,很快就会成为一个特殊的 DataFrame 可以处理地理数据.

From here, I easily understand that GeoPandas is built on Pandas especially for its GeoDataFrame object, which is, shortly, a special DataFrame that can handle geographic data.

但我想知道为什么 GeoPandas 能够直接获取 psycopg2 连接作为参数,而不是 Pandas 是否计划用于后者?

But I'm wondering why do GeoPandas has the ability to directly takes a psycopg2 connection as an argument and not Pandas and if it is planned for the latter?

为什么在写入数据时,两者都不是这样?
我想(可能还有许多其他人1,2)直接给他们一个psycopg2 connections 参数而不是依赖于 SQLAlchemy 引擎.
因为即使是这个工具真的很棒,它让我使用两个不同的框架来连接到我的数据库,从而处理两个不同的连接字符串(我个人更喜欢 psycopg2 处理来自字典的参数扩展以正确构建连接字符串,例如;psycopg2.connect(**dict_params) vs URL 注入如解释这里例如:是吗可以将字典传递给 SQLAlchemy 中的 create_engine 函数吗?).

And why is it neither the case for one nor the other when it comes to writing data?
I would like (as probably many of others1,2) to directly give them a psycopg2 connections argument instead of relying on SQLAlchemy engine.
Because even is this tool is really great, it makes me use two different frameworks to connect to my database and thus handle two different connection strings (and I personally prefer the way psycopg2 handles the parameters expansion from a dictionary to build a connection string properly such as; psycopg2.connect(**dict_params) vs URL injection as explained here for example: Is it possible to pass a dictionary into create_engine function in SQLAlchemy?).

  1. 我首先使用 psycopg2 从这样的参数字典:

  1. I was first creating my connection string with psycopg2 from a dictionary of parameters this way:

connParams = ("user={}", "password={}", "host={}", "port={}", "dbname={}")
conn = ' '.join(connParams).format(*dict_params.values())

  • 然后我发现这样更好更pythonic:

  • Then I figured out it was better and more pythonic this way:

    conn = psycopg2.connect(**dict_params)
    

  • 我最终用它代替了它,这样我就可以互换使用它来构建 psycopg2 connections,或SQLAlchemy 引擎:

    def connector():
        return psycopg2.connect(**dict_params)
    

    a) 初始化一个 psycopg2 连接 现在由以下人员完成:

    a) Initialize a psycopg2 connection is now done by:

    conn = connector()
    curs = conn.cursor()
    

    b) 并初始化一个 SQLAlchemyengine 作者:

    b) And initialize a SQLAlchemy engine by:

    engine = create_engine('postgresql+psycopg2://', creator=connector)
    

  • (或任何你喜欢的 db+driver)

    这里有详细记录:
    https://docs.sqlalchemy.org/en/13/core/engines.html#custom-dbapi-args
    在这里:
    https://docs.sqlalchemy.org/en/13/核心/引擎.html#sqlalchemy.create_engine

    [1] 没有Sql Alchemy引擎的数据帧到sql
    [2] 如何写数据框不使用 SQLAlchemy 引擎的 Postgres 表?

    推荐答案

    可能是 to_sql 需要 SQLAlchemy Connectable (EngineConnection) 对象是 to_sql 需要能够创建数据库表,如果它不存在或需要替换.早期版本的 Pandas 专门用于 DBAPI 连接,但我怀疑当他们向 to_sql 添加新功能时,他们发现自己编写了大量特定于数据库的代码来解决各种 DDL 实现的怪癖.

    Probably the main reason why to_sql needs a SQLAlchemy Connectable (Engine or Connection) object is that to_sql needs to be able to create the database table if it does not exist or if it needs to be replaced. Early versions of pandas worked exclusively with DBAPI connections, but I suspect that when they were adding new features to to_sql they found themselves writing a lot of database-specific code to work around the quirks of the various DDL implementations.

    在意识到他们正在复制 SQLAlchemy 中已有的许多逻辑后,他们可能决定通过简单地接受 Engine/Connection 将所有复杂性外包"给 SQLAlchemy 本身 对象并使用 SQLAlchemy 的(独立于数据库的)SQL 表达式语言来创建表.

    On realizing that they were duplicating a lot of logic that was already in SQLAlchemy they likely decided to "outsource' all of that complexity to SQLAlchemy itself by simply accepting an Engine/Connection object and using SQLAlchemy's (database-independent) SQL Expression language to create the table.

    这篇关于为什么 Pandas 和 GeoPandas 能够使用 DBAPI (psycopg2) 连接读取数据库表,但必须依靠 SQLAlchemy 来编写?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆