重复键值违反唯一约束-尝试从dask数据帧创建sql表时发生postgres错误 [英] duplicate key value violates unique constraint - postgres error when trying to create sql table from dask dataframe

查看:975
本文介绍了重复键值违反唯一约束-尝试从dask数据帧创建sql表时发生postgres错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

接着从

Following on from this question, when I try to create a postgresql table from a dask.dataframe with more than one partition I get the following error:

IntegrityError: (psycopg2.IntegrityError) duplicate key value violates unique constraint "pg_type_typname_nsp_index"
DETAIL:  Key (typname, typnamespace)=(test1, 2200) already exists.
 [SQL: '\nCREATE TABLE test1 (\n\t"A" BIGINT, \n\t"B" BIGINT, \n\t"C" BIGINT, \n\t"D" BIGINT, \n\t"E" BIGINT, \n\t"F" BIGINT, \n\t"G" BIGINT, \n\t"H" BIGINT, \n\t"I" BIGINT, \n\t"J" BIGINT, \n\tidx BIGINT\n)\n\n']

您可以使用以下代码重新创建错误:

You can recreate the error with the following code:

import numpy as np
import dask.dataframe as dd
import dask
import pandas as pd
import sqlalchemy_utils as sqla_utils
import sqlalchemy as sqla
DATABASE_CONFIG = {
    'driver': '',
    'host': '',
    'user': '',
    'password': '',
    'port': 5432,
}
DBNAME = 'dask'
url = '{driver}://{user}:{password}@{host}:{port}/'.format(
        **DATABASE_CONFIG)
db_url = url.rstrip('/') + '/' + DBNAME
# create db if non-existent
if not sqla_utils.database_exists(db_url):
    print('Creating database \'{}\''.format(DBNAME))
    sqla_utils.create_database(db_url)
conn = sqla.create_engine(db_url)
# create pandas df with random numbers
df = pd.DataFrame(np.random.randint(0,40,size=(100, 10)), columns=list('ABCDEFGHIJ'))
# add index so that it can be used as primary key later on
df['idx'] = df.index
# create dask df
ddf = dd.from_pandas(df, npartitions=4)
# Write to psql
dto_sql = dask.delayed(pd.DataFrame.to_sql)
out = [dto_sql(d, 'test', db_url, if_exists='append', index=False, index_label='idx')
       for d in ddf.to_delayed()]
dask.compute(*out)

如果将npartitions设置为1,则代码不会产生错误.因此,我猜想这与postgres无法处理并行请求以写入同一sql表有关?我该如何解决?

The code doesn't produce an error if npartitions is set to 1. So I'm guessing it has to do with postgres not being able to handle parallel requests to write to a same sql table...? How can I fix this?

推荐答案

我正在阅读(如google小组讨论中所述).

I was reading this. It seems this error rises when you are creating/updating the same table with parallel processing. I understand it depends because of this (as explained on the google group discussion).

因此,我认为它取决于PostgreSQL本身,而不取决于连接驱动程序或用于多处理的模块.

So I think it depend from PostgreSQL itself and not from the connection driver or the module used for the multiprocessing.

实际上,我发现解决此问题的唯一方法是创建足够大的块,以使写入过程比计算本身慢. 使用较大的块时,此错误不会增加.

Well, Actually, the only way I found to solve this is to create chunks big enough to have back a writing process slower than the calculation itself. With bigger chunks this error doesn't rise.

这篇关于重复键值违反唯一约束-尝试从dask数据帧创建sql表时发生postgres错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆