如何不使用SQLAlchemy引擎将数据帧写入Postgres表? [英] How to write data frame to Postgres table without using SQLAlchemy engine?

查看:133
本文介绍了如何不使用SQLAlchemy引擎将数据帧写入Postgres表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个要写入 Postgres 数据库的数据框。该功能必须是 Flask 应用程序的一部分。



目前,我通过创建一个 SQLAlchemy引擎,并将其传递给 df.to_sql(),以将数据帧写入数据库表。



但是,当我将此功能集成到Flask应用程序中时,我已经具有使用 Psycopg2连接池创建的 Postgres 数据库的现有连接。 / p>

查看 df.to_sql()文档时,提到它使用了 SQLAlchemy引擎。我看不到任何其他连接机制。 https: //pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html#pandas-dataframe-to-sql



<我的问题是,当我拥有现有连接时,为什么需要创建此SQLAlchemy引擎。我为什么不能使用它们?

解决方案

您可以使用这些连接并避免使用SQLAlchemy。这听起来很不直观,但是它将比常规插入快得多(即使您要删除ORM并进行常规查询,例如使用 executemany )。即使使用原始查询,插入速度仍然很慢,但是您会看到COPY / how-to-speed-up-insertion-performance-in-postgresql>如何加快PostgreSQL中的插入性能。在这种情况下,我采用以下方法的动机是:


  1. 改用 COPY INSERT

  2. 不信任熊猫为该操作生成正确的SQL(尽管正如IljaEverilä指出的那样,这种方法实际上在V0.24中将添加到了熊猫中

  3. 不要将数据写入磁盘以创建实际的文件对象;将所有内容保存在内存中

使用 cursor.copy_from()

  import csv 
import io
import psycopg2

df =< your_df_here>

#将所有不需要的列都放在插入数据中

#首先取标题
标题= df.columns

#现在获得嵌套的值列表
data = df.values.tolist()

#创建内存CSV文件
string_buffer = io.StringIO( )
csv_writer = csv.writer(string_buffer)
csv_writer.writerows(data)

#将缓冲区重置回第一行
string_buffer.seek(0)

#使用psycopg2.connect(dbname = current_app.config ['POSTGRES_DB'],
user = current_app.config ['POSTGRES_USER'],
password = current_app.config ['POSTGRES_PW'],
host = current_app.config ['POSTGRES_URL'])作为conn:
c = conn.cursor ()

#现在将数据当作文件
上传。c.copy_from(string_buffer,'the_table_name',sep =',',columns = headers)
conn .co mmit()

这应该比实际执行插入操作快几个数量级。


I have a data frame that I want to write to a Postgres database. This functionality needs to be part of a Flask app.

For now, I'm running this insertion part as a separate script by creating an SQLAlchemy engine and passing it to the df.to_sql() to write the data frame to a database table.

But when I integrate this functionality into a Flask app, I already have existing connections to the Postgres database which were created using Psycopg2 connection pool.

When looked at df.to_sql() documentation, it is mentioned that it uses the SQLAlchemy engine. I don't see any other connection mechanism. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html#pandas-dataframe-to-sql

My question is why do I need this SQLAlchemy engine to be created when I have the existing connections. Why can't I use them?

解决方案

You can use those connections and avoid SQLAlchemy. This is going to sound rather unintuitive, but it will be much faster than regular inserts (even if you were to drop the ORM and make a general query e.g. with executemany). Inserts are slow, even with raw queries, but you'll see that COPY is mentioned several times in How to speed up insertion performance in PostgreSQL. In this instance, my motivations for the approach below are:

  1. Use COPY instead of INSERT
  2. Don't trust Pandas to generate the correct SQL for this operation (although, as noted by Ilja Everilä, this approach actually got added to Pandas in V0.24)
  3. Don't write the data to disk to make an actual file object; keep it all in memory

Suggested approach using cursor.copy_from():

import csv
import io
import psycopg2

df = "<your_df_here>"

# drop all the columns you don't want in the insert data here

# First take the headers
headers = df.columns

# Now get a nested list of values
data = df.values.tolist()

# Create an in-memory CSV file
string_buffer = io.StringIO()
csv_writer = csv.writer(string_buffer)
csv_writer.writerows(data)

# Reset the buffer back to the first line
string_buffer.seek(0)

# Open a connection to the db (which I think you already have available)
with psycopg2.connect(dbname=current_app.config['POSTGRES_DB'], 
                      user=current_app.config['POSTGRES_USER'],
                      password=current_app.config['POSTGRES_PW'], 
                      host=current_app.config['POSTGRES_URL']) as conn:
    c = conn.cursor()

    # Now upload the data as though it was a file
    c.copy_from(string_buffer, 'the_table_name', sep=',', columns=headers)
    conn.commit()

This should be orders of magnitude faster than actually doing inserts.

这篇关于如何不使用SQLAlchemy引擎将数据帧写入Postgres表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆