Python使用psycopg2将DateFrame写入AWS Redshift [英] Python write DateFrame to AWS redshift using psycopg2
问题描述
我想每天更新AWS中的一个表,我打算做的是先使用Python psycopg2删除AWS公共表中的数据/行,然后将python数据帧数据插入该表中.
I want to update a table in AWS on a daily basis, what I plan to do is to delete data/rows in a public table in AWS using Python psycopg2 first, then insert a python dataframe data into that table.
import psycopg2
import pandas as pd
con=psycopg2.connect(dbname= My_Credential.....)
cur = con.cursor()
sql = """
DELETE FROM tableA
"""
cur.execute(sql)
con.commit()
上面的代码可以删除,但是我不知道如何编写python代码将My_Dataframe插入tableA. TableA的大小大约为100万行到500万行.
the above code can do the delete, but I don't know how to write python code to insert My_Dataframe to the tableA. TableA size is around 1 millions rows to 5 millions, please advise.
推荐答案
我同意@ mdem7在注释中的建议,使用dataframe
插入1-5百万个数据根本不是一个好主意,您将面对性能问题.
I agree with what @mdem7 has suggested in comment, inserting 1-5 million data using dataframe
is not a good idea at all and you will face performance issues.
最好使用S3
到Redshift
加载方法.您的代码可以同时执行Truncate
和Copy
命令.
Its better to use the S3
to Redshift
load approach. Here goes your code to do both Truncate
and Copy
command.
import psycopg2
def redshift():
conn = psycopg2.connect(dbname='database_name', host='888888888888****.u.****.redshift.amazonaws.com', port='5439', user='username', password='********')
cur = conn.cursor();
cur.execute("truncate table example;")
//Begin your transaction
cur.execute("begin;")
cur.execute("copy example from 's3://examble-bucket/example.csv' credentials 'aws_access_key_id=ID;aws_secret_access_key=KEY/KEY/pL/KEY' csv;")
////Commit your transaction
cur.execute("commit;")
print("Copy executed fine!")
redshift();
There are even more ways to make Copy
faster in Menifest
option, so that Redshift
could load the data in parallel.
Hope this give you some idea to move.
这篇关于Python使用psycopg2将DateFrame写入AWS Redshift的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!