避免Python中PostgreSQL数据库中的重复数据 [英] Avoiding duplicated data in PostgreSQL database in Python

查看:44
本文介绍了避免Python中PostgreSQL数据库中的重复数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究 PostgreSQL 和 psycopg2.试获取每 10 分钟更新一次的提要数据并将此提要内容保留在 PostgreSQL 数据库中.我的目标是检索并从该表中打印这些数据.但面临的问题是,由于对表的插入操作,每次运行该脚本时,重复数据也会存储在数据库中.

为了解决这个问题,我对表 Locations-musiq1 中的列 location_title 做了主键约束,我打算在其中存储我的提要数据.但面临错误.

这是我的代码:

导入psycopg2导入系统导入提要解析器导入编解码器导入 psycopg2.extensions# 解析来自 Geofeed 位置提要的数据data = feedparser.parse("某个 URL")psycopg2.extensions.register_type(psycopg2.extensions.UNICODE)尝试:conn=psycopg2.connect("dbname='name' user='postgres' host='localhost' password='abcds'")conn.set_client_encoding('UNICODE')除了:打印我无法连接到数据库,正在退出."系统退出()cur=conn.cursor()对于范围内的 i(len(data['entries'])):cur.execute("插入locations_musiq1(location, location_title) VALUES (%s, %s)", (data.entries[i].title,data.entries[i].summary))conn.commit()cur.execute("SELECT * FROM location_musiq1;")cur.fetchone()对于 cur 中的行:打印 ' '.join(row[1:])cur.close()conn.close()

将locations_musiq1"表列location_title"更改为主键后的错误是:

<前>回溯(最近一次调用最后一次):文件F:\JavaWorkspace\Test\src\postgr_example.py",第 28 行,在cur.execute("插入locations_musiq1(location, location_title) VALUES (%s, %s)", (data.entries[i].title,data.entries[i].summary))psycopg2.IntegrityError:重复键值违反唯一约束locations_musiq1_pkey"

任何人都可以解决这个问题吗?..提前致谢..

解决方案

你可以试试这样的:

cur.execute("""插入locations_musiq1(location, location_title)SELECT %s, %s 不存在(从locations_musiq1 WHERE location_title=%s 中选择location_title);""", (data.entries[i].title, data.entries[i].summary, data.entries[i].summary))

I am working on PostgreSQL and psycopg2. Trying to get feed data which is updated every after 10 mins and keep this feeds contents in PostgreSQL database.My target is to retrieve and print those data from that table. But facing problem as duplicate data is also stored in the database every time I run that script due to insertion operation on table.

To get out off this problem ,I made primary key constraint of column location_title in table Locations-musiq1 where I intend to store my feed data.But facing error.

Here is my code:

import psycopg2
import sys
import feedparser
import codecs
import psycopg2.extensions


# Parsing data from Geofeed location feeds

data = feedparser.parse("some URL")
psycopg2.extensions.register_type(psycopg2.extensions.UNICODE)



try:

    conn=psycopg2.connect("dbname='name' user='postgres' host='localhost'     password='abcds'")
    conn.set_client_encoding('UNICODE')


except:
    print "I am unable to connect to the database, exiting."
    sys.exit()
cur=conn.cursor()


for i in range(len(data['entries'])):
    cur.execute("INSERT INTO locations_musiq1(location, location_title) VALUES (%s, %s)",    (data.entries[i].title,data.entries[i].summary))
    conn.commit()
cur.execute("SELECT * FROM locations_musiq1;")
cur.fetchone()
for row in cur:
    print '   '.join(row[1:])


cur.close()
conn.close()

My error after changing "locations_musiq1" tables column "location_title" as primary key is:

    Traceback (most recent call last):
      File "F:\JavaWorkspace\Test\src\postgr_example.py", line 28, in 
        cur.execute("INSERT INTO locations_musiq1(location, location_title) VALUES (%s, %s)",    (data.entries[i].title,data.entries[i].summary))
    psycopg2.IntegrityError: duplicate key value violates unique constraint "locations_musiq1_pkey"

Can anybody have any idea to get out of this problem ?..Thanks in advance..

解决方案

You could try something like this:

cur.execute("""
  INSERT INTO locations_musiq1(location, location_title) 
  SELECT %s, %s WHERE NOT EXISTS 
      (SELECT location_title FROM locations_musiq1 WHERE location_title=%s);
  """, (data.entries[i].title, data.entries[i].summary, data.entries[i].summary))

这篇关于避免Python中PostgreSQL数据库中的重复数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆