Python psql \将CSV复制到远程服务器 [英] Python psql \copy CSV to remote server

查看:68
本文介绍了Python psql \将CSV复制到远程服务器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将python 3.6的csv(具有标头和引号)复制到远程Postgres 10服务器上的表中.它是大型CSV文件(2.5M行,800MB),而我之前已导入放入一个数据框,然后使用dataframe.to_sql,这非常占用内存,因此我转而使用COPY.

I am attempting to copy a csv (which has a header and quote character ") with python 3.6 to a table on a remote postgres 10 server. It is a large CSV (2.5M rows, 800MB) and while I previously imported it into a dataframe and then used dataframe.to_sql, this was very memory intensive so I switched to using COPY.

将COPY与psycopg2或sqlalchemy一起使用可以正常工作,但远程服务器无权访问本地文件系统.

Using COPY with psycopg2 or sqlalchemy would work fine but the remote server does not have access to the local file system.

在终端中使用psql我已经成功运行了以下查询以填充表.我认为psycopg2或sqlalchemy无法使用\ copy.

Using psql in the terminal I have successfully run the query below to populate the table. I don't think using \copy is possible with psycopg2 or sqlalchemy.

\copy table (col1, col2) FROM file_location CSV HEADER QUOTE '"' NULL ''

但是,当我尝试使用如下所示的单行psql -c命令时,它不起作用并且出现错误:

However when I try to use a one line psql -c command like below, it does not work and I get the error:

错误:COPY引号必须是一个单字节字符.

ERROR: COPY quote must be a single one-byte character.

psql -U user -h ip -d db -w pw -c "\copy table (col1, col2) FROM file_location CSV HEADER QUOTE '"' NULL ''"

你能告诉我为什么会这样吗?

Could you tell me why this is the case?

这一行式-c psql语句在python中使用子进程模块将比不必打开终端并执行我不确定如何执行的命令更容易实现.如果您可以提出解决方法或其他方法,那将是很好的选择.

This one-line -c psql statement would be easier to implement with the subprocess module in python than having to open a terminal and execute a command which I'm not sure how to do. If you could suggest a workaround or different methodology that would be great.

=======Per Andrew的建议是转义在命令行中起作用的引号字符.但是,当像下面这样在python中实现它时,会出现一个新错误:

====== Per Andrew's suggestion to escape the quote character this worked on the command line. However when implementing it in python like below, a new error comes up:

/bin/sh:-c:第0行:寻找匹配的''时出现意外的EOF

/bin/sh: -c: line 0: unexpected EOF while looking for matching `''

/bin/sh:-c:第1行:语法错误:文件意外结束

/bin/sh: -c: line 1: syntax error: unexpected end of file

"\"\copy table (col1, col2) FROM file_location CSV HEADER QUOTE '\"' NULL ''\""
cmd = f'psql -U {user} -h {ip} -d {db} -w {pw} -c {copy_statement}'
subprocess.call(cmd, shell=True)

推荐答案

如果可以避免,请尽量不要使用 shell = True .最好自己标记命令以帮助sh.

Try not to use shell=True if you can avoid it. better to tokenize the command yourself to help sh.

subprocess.call(["psql", "-U", "{user}", "-h", "{ip}", "-d", "{db}", "-w", "{pw}", "-c", "{copy statement}"])

在这种情况下,您的copy语句可以原样传递给psql,因为没有 shell 引用问题可以考虑.(N.B.仍必须为python引用,因此字符串将保持原样.)

In this case then your copy statement could be as it is passed to psql verbatim, because there are no shell quoting issues to take into account. (N.B. still have to quote this for python, so the string would remain as is).

如果您仍然想使用 shell = True ,则必须对python shell的字符串文字进行转义

If you still want to use shell=True then you have to escape the string literal for both python and shell

"\"\copy table (col1, col2) FROM file_location CSV HEADER QUOTE '\\\"' NULL ''\""

将在python中创建一个字符串,

will create a string in python which will be

"\copy table (col1, col2) FROM file_location CSV HEADER QUOTE '\"' NULL ''\"

首先发现的是我们在shell上需要的内容!

Which is what we found out we needed on our shell in the first place!

编辑(从评论中澄清):

subprocess.call ,当不使用 shell = True 时,采用可迭代的参数.

subprocess.call, when not using shell=True, takes an iterable of arguments.

所以你可以拥有

psql_command = "\"\copy table (col1, col2) FROM file_location CSV HEADER QUOTE '\\\"' NULL ''\""
# user, hostname, password, dbname all defined elsewhere above.
command = ["psql",
    "-U", user,
    "-h", hostname,
    "-d", dbname,
    "-w", password,
    "-c", psql_command,
]

subprocess.call(command)

请参见 https://docs.python.org/2/library/subprocess.html#subprocess.call https://docs.python.org/3/library/subprocess.html#subprocess.call

额外-请注意,为避免注入外壳,应使用此处描述的方法.请参见 https://docs.python.org的警告部分/2/library/subprocess.html#frequently-used-arguments

extra edit :- Please note that to avoid shell injection, you should be using the method described here. See the warning section of https://docs.python.org/2/library/subprocess.html#frequently-used-arguments

这篇关于Python psql \将CSV复制到远程服务器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆