Python使用psycopg2将DateFrame写入AWS Redshift [英] Python write DateFrame to AWS redshift using psycopg2

查看：134 发布时间：2020/8/23 3:55:16 python python-3.x amazon-web-services amazon-redshift

本文介绍了Python使用psycopg2将DateFrame写入AWS Redshift的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想每天更新AWS中的一个表，我打算做的是先使用Python psycopg2删除AWS公共表中的数据/行，然后将python数据帧数据插入该表中.

I want to update a table in AWS on a daily basis, what I plan to do is to delete data/rows in a public table in AWS using Python psycopg2 first, then insert a python dataframe data into that table.

import psycopg2
import pandas as pd

con=psycopg2.connect(dbname= My_Credential.....)
cur = con.cursor()

sql = """
DELETE FROM tableA
"""

cur.execute(sql)
con.commit()

上面的代码可以删除，但是我不知道如何编写python代码将My_Dataframe插入tableA. TableA的大小大约为100万行到500万行.

the above code can do the delete, but I don't know how to write python code to insert My_Dataframe to the tableA. TableA size is around 1 millions rows to 5 millions, please advise.

推荐答案

我同意@ mdem7在注释中的建议，使用dataframe插入1-5百万个数据根本不是一个好主意，您将面对性能问题.

I agree with what @mdem7 has suggested in comment, inserting 1-5 million data using dataframe is not a good idea at all and you will face performance issues.

最好使用S3到Redshift加载方法.您的代码可以同时执行Truncate和Copy命令.

Its better to use the S3 to Redshift load approach. Here goes your code to do both Truncate and Copy command.

import psycopg2


def redshift():

    conn = psycopg2.connect(dbname='database_name', host='888888888888****.u.****.redshift.amazonaws.com', port='5439', user='username', password='********')
    cur = conn.cursor();

    cur.execute("truncate table example;")

    //Begin your transaction
    cur.execute("begin;")
    cur.execute("copy example from 's3://examble-bucket/example.csv' credentials 'aws_access_key_id=ID;aws_secret_access_key=KEY/KEY/pL/KEY' csv;")
    ////Commit your transaction
    cur.execute("commit;")
    print("Copy executed fine!")

redshift();

在Menifest中，还有更多方法可以使Copy更快

There are even more ways to make Copy faster in Menifest option, so that Redshift could load the data in parallel. Hope this give you some idea to move.

这篇关于Python使用psycopg2将DateFrame写入AWS Redshift的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python使用psycopg2将DateFrame写入AWS Redshift [英] Python write DateFrame to AWS redshift using psycopg2

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python使用psycopg2将DateFrame写入AWS Redshift [英] Python write DateFrame to AWS redshift using psycopg2

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭