使用Python将大型csv文件上传到AWS中的Postgres RDS [英] Using Python to upload large csv files to Postgres RDS in AWS

查看:143
本文介绍了使用Python将大型csv文件上传到AWS中的Postgres RDS的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用Python将大型csv文件加载到AWS中的Postgres RDS数据库中的最简单方法是什么?

What's the easiest way to load a large csv file into a Postgres RDS database in AWS using Python?

要将数据传输到本地postgres实例,我以前有使用 psycopg2 连接运行SQL语句,例如:

To transfer data to a local postgres instance, I have previously used a psycopg2 connection to run SQL statements like:

COPY my_table FROM 'my_10gb_file.csv' DELIMITER ',' CSV HEADER;

但是,当对远程AWS RDS数据库执行此操作时,由于 .csv 文件位于我的本地计算机而不是数据库服务器上:

However, when executing this against a remote AWS RDS database, this generates an error because the .csv file is on my local machine rather than the database server:

ERROR: must be superuser to COPY to or from a file
SQL state: 42501
Hint: Anyone can COPY to stdout or from stdin. psql's \copy command also works for anyone.

此答案解释了为什么此方法不起作用。

This answer explains why this doesn't work.

我现在正在寻找Python语法,以使用 psql 将其自动化。我有很多需要上传的 .csv 文件,因此我需要一个脚本来使其自动化。

I'm now looking for the Python syntax to automate this using psql. I have a large number of .csv files I need to upload, so I need a script to automate this.

推荐答案

首先,您需要使用 CREATE TABLE SQL语句在RDS Postgres中正常创建表定义。

First you need to create the table definitions in the RDS Postgres as normal using CREATE TABLE SQL statements.

然后,您需要运行这样的 psql 语句:

Then you need to run a psql statement like this:

psql -p 5432 --host YOUR_HOST --username YOUR_USERNAME --dbname YOUR_DBNAME --command "\copy my_table FROM 'my_10gb_file.csv' DELIMITER ',' CSV HEADER"

在Python中,我们可以如下设置并执行:

In Python, we can set this up and execute it as follows:

host = "YOUR_HOST"
username = "YOUR_USERNAME"
dbname = "YOUR_DBNAME"

table_name = "my_table"
file_name = "my_10gb_file.csv"
command = "\copy {} FROM '{}' DELIMITER ',' CSV HEADER".format(table_name, file_name)

psql_template = 'psql -p 5432 --host {} --username {} --dbname {} --command "{}"'

bash_command = psql_template.format(host, username, dbname, command.strip())

process = subprocess.Popen(bash_command, stdout=subprocess.PIPE, shell=True) 

output, error = process.communicate()

这篇关于使用Python将大型csv文件上传到AWS中的Postgres RDS的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆