使用cursor.copy_from()的psycopg2 COPY将冻结大量输入 [英] psycopg2 COPY using cursor.copy_from() freezes with large inputs
问题描述
使用psycopg2 cursor
对象(以下内容为清楚起见已更改或省略了一些列)在Python中考虑以下代码:
Consider the following code in Python, using psycopg2 cursor
object (Some column names were changed or omitted for clarity):
filename='data.csv'
file_columns=('id', 'node_id', 'segment_id', 'elevated',
'approximation', 'the_geom', 'azimuth')
self._cur.copy_from(file=open(filename),
table=self.new_table_name, columns=file_columns)
- 数据库位于快速LAN上的远程计算机上。
- 从bash使用
\COPY
的速度非常快,即使对于大型(〜1,000,000行)文件也是如此。 - The database is located on a remote machine on a fast LAN.
- Using
\COPY
from bash works very fast, even for large (~1,000,000 lines) files.
此代码对于5,000行来说是超快速的,但是当 data.csv
增长到10,000行以上时,该程序将完全冻结。
This code is ultra-fast for 5,000 lines, but when data.csv
grows beyond 10,000 lines, the program freezes completely.
有什么想法解决方案吗?
Any thoughts \ solutions?
亚当
推荐答案
这只是一种解决方法,但是您可以将某些内容通过管道传递到psql中。有时候我懒得淘汰psycopg2
This is just a workaround, but you can just pipe something into psql. I use this recipe sometimes when I am too lazy to bust out psycopg2
import subprocess
def psql_copy_from(filename, tablename, columns = None):
"""Warning, this does not properly quote things"""
coltxt = ' (%s)' % ', '.join(columns) if columns else ''
with open(filename) as f:
subprocess.check_call([
'psql',
'-c', 'COPY %s%s FROM STDIN' % (tablename, coltxt),
'--set=ON_ERROR_STOP=true', # to be safe
# add your connection args here
], stdin=f)
就锁定而言,您正在使用多个线程还是类似的线程?
As far as your locking up is concerned, are you using multiple threads or anything like that?
您的postgres是否记录了诸如关闭的连接或死锁之类的信息?锁定后,您能否看到磁盘活动?
Is your postgres logging anything such as a closed connection or a deadlock? Can you see disk activity after it locks up?
这篇关于使用cursor.copy_from()的psycopg2 COPY将冻结大量输入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!