为什么这个代码在PostgreSQL中失败,以及如何解决它(解决方法)?是Postgres SQL引擎的缺陷吗? [英] Why this code fails in PostgreSQL and how to fix it (work-around)? Is it Postgres SQL engine flaw?

查看:118
本文介绍了为什么这个代码在PostgreSQL中失败,以及如何解决它(解决方法)?是Postgres SQL引擎的缺陷吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我发现奇怪的Postgres行为时,我一直在处理文本解析任务。我的原始代码暴露奇怪的错误是用Java与JDBC连接的PostgreSQL(v8.3.3和v8.4.2测试),这是我的原始帖子:。我刚刚移植我的Java代码给了纯plpgsql,它给出了相同的错误(与原始帖子中描述的行为相同)。

I've been working on text parsing task when I found strange Postgres behavior. My original code exposing strange error was written in Java with JDBC connectivity for PostgreSQL (v8.3.3 and v8.4.2 tested), here is my original post: Is it an error of PostgreSQL SQL engine and how to avoid (workaround) it?. I've just ported my Java code given there to pure plpgsql and it gives the same errors (same behavior as described in original post).

简化的代码现在没有什么做分析 - 它只是生成伪随机(但可重复)的字,并在规范化后插入它们(表 spb_word 保存唯一字和ids,它们由最终表中的id引用 spb_obj_word 和表 spb_word4obj 作为输入缓冲区)。

Simplified code has now nothing to do with parsing - it just generates pseudo-random (but repeatable) words and inserts them after normalizing (table spb_word holds unique words and ids, they are referenced by id in final table spb_obj_word and table spb_word4obj works as input buffer).

这是我的表(来自OP的c和p):

Here are my tables (c&p from OP):

create sequence spb_word_seq;

create table spb_word (
  id bigint not null primary key default nextval('spb_word_seq'),
  word varchar(410) not null unique
);

create sequence spb_obj_word_seq;

create table spb_obj_word (
  id int not null primary key default nextval('spb_obj_word_seq'),
  doc_id int not null,
  idx int not null,
  word_id bigint not null references spb_word (id),
  constraint spb_ak_obj_word unique (doc_id, word_id, idx)
);

create sequence spb_word4obj_seq;

create table spb_word4obj (
  id int not null primary key default nextval('spb_word4obj_seq'),
  doc_id int not null,
  idx int not null,
  word varchar(410) not null,
  word_id bigint null references spb_word (id),
  constraint spb_ak_word4obj unique (doc_id, word_id, idx),
  constraint spb_ak_word4obj2 unique (doc_id, word, idx)
);

并将代码从原始Java代码移植到plpgsql:

and code ported to plpgsql from original Java code:

create sequence spb_wordnum_seq;

create or replace function spb_getWord() returns text as $$
declare
  rn int;
  letters varchar(255) :=   'ąćęłńóśźżjklmnopqrstuvwxyz';
                          --'abcdefghijklmnopqrstuvwxyz';
  llen int := length(letters);
  res text := '';
  wordnum int;
begin
  select nextval('spb_wordnum_seq') into wordnum;

  rn := 3 * (wordnum + llen * llen * llen);
  rn := (rn + llen) / (rn % llen + 1);
  rn := rn % (rn / 2 + 10);

  loop
    res := res || substring(letters, rn % llen, 1);
    rn := floor(rn / llen);
    exit when rn = 0;
  end loop;

  --raise notice 'word for wordnum=% is %', wordnum, res;

  return res;
end;
$$ language plpgsql;



create or replace function spb_runme() returns void as $$
begin
  perform setval('spb_wordnum_seq', 1, false);
  truncate table spb_word4obj, spb_word, spb_obj_word;

  for j in 0 .. 50000-1 loop

    if j % 100 = 0 then raise notice 'j = %', j; end if;

    delete from spb_word4obj where doc_id = j;

    for i in 0 .. 20 - 1 loop
      insert into spb_word4obj (word, idx, doc_id) values (spb_getWord(), i, j);         
    end loop;

    update spb_word4obj set word_id = w.id from spb_word w 
    where w.word = spb_word4obj.word and doc_id = j;

    insert into spb_word (word) 
    select distinct word from spb_word4obj 
    where word_id is null and doc_id = j;

    update spb_word4obj set word_id = w.id 
    from spb_word w 
    where w.word = spb_word4obj.word and 
    word_id is null and doc_id = j;

    insert into spb_obj_word (word_id, idx, doc_id) 
    select word_id, idx, doc_id from spb_word4obj where doc_id = j;
  end loop;
end;
$$ language plpgsql;

要运行它,只需执行 select spb_runme() as as SQL statement。

To run this simply execute select spb_runme() as SQL statement.

这里是错误的第一个例子:

Here is first example of error:

NOTICE:  j = 8200
ERROR:  duplicate key value violates unique constraint "spb_word_word_key"
CONTEXT:  SQL statement "insert into spb_word (word) select distinct word from spb_word4obj where word_id is null and doc_id =  $1 "
PL/pgSQL function "spb_runme" line 18 at SQL statement

/ p>

and second one:

NOTICE:  j = 500
ERROR:  null value in column "word_id" violates not-null constraint
CONTEXT:  SQL statement "insert into spb_obj_word (word_id, idx, doc_id) select word_id, idx, doc_id from spb_word4obj where doc_id =  $1 "
PL/pgSQL function "spb_runme" line 27 at SQL statement

这些错误以不可预测的方式发生 - 每次在不同的迭代中( j )和不同的词导致错误。

These errors occur in unpredictable manner - every time in different iteration (j) and with different words causing error.

当从生成的词语中删除波兰国家字符(ąćęłńóśźż letters varchar(255):='ąćęłńóśźżjklmnopqrstuvwxyz'; 变成 letters varchar(255):='abcdefghijklmnopqrstuvwxyz'; 有没有错误!我的DB是用UTF-8编码创建的,所以非ascii字符应该没有问题,但显然非常重要!

When polish national characters (ąćęłńóśźż) are eliminated from generated words (line letters varchar(255) := 'ąćęłńóśźżjklmnopqrstuvwxyz'; becomes letters varchar(255) := 'abcdefghijklmnopqrstuvwxyz';) there is no error! My DB is created with UTF-8 encoding, so there should be no problem with non-ascii chars, but apparently it is very important!

现在我的问题:我的代码有什么问题?或者是PostgreSQL的严重错误?如何解决此错误?

BTW:如果它是PostgreSQL引擎中的错误,那么这个数据库如何可信?我应该移动到一个免费的替代品(例如MySQL)?

BTW: If it is an error in PostgreSQL engine, then how this DB can be trustworthy? Should I move to one of free alternatives (e.g. MySQL)?

UPDATE: >额外说明(主要针对OMG小马)

UPDATE: extra explanations (mainly for OMG Ponies)

如果我删除不必要的 delete 同样的错误。

If I remove unnecessary delete - I still have the same errors.

函数 spb_getWord()必须生成重复的单词 - 它模拟文本解析,字 - 和一些字得到重复 - 这是正常的,我的代码的其余部分是处理重复。因为 spb_getWord()生成了可能的重复,所以我在缓冲表中插入单词 spb_word4obj ,然后更新 word_id 在此表中对于来自 spb_word 的已处理的单词。所以现在 - 如果 spb_word4obj 中的行具有 word_id 不是null - 那么它是一个重复,因此我不会插入字插入 spb_word 。但是 - 作为OMG Ponies提到,我得到错误重复键值违反唯一约束这意味着我的代码,处理重复正确失败。也就是说我的代码失败,因为内部的Postgres错误 - 正确的代码是以某种方式由Postgres执行并失败。

Function spb_getWord() must generate words with duplicates - it simulates text parsing and dividing it to words - and some words get repeated - that is normal and rest of my code is dealing with duplicates. Because of possible duplicates generated by spb_getWord() I insert words to buffer table spb_word4obj and then I update word_id in this table for already processed words from spb_word. So now - if row in spb_word4obj has word_id not null - then it is a duplicate, so I will not insert this word into spb_word. But - as OMG Ponies mentioned, I get error duplicate key value violates unique constraint which means my code that handles duplicates properly fails. I.e. my code fails because of internal Postgres error - proper code is somehow badly executed by Postgres and fails.

插入新单词(识别并标记为不重复) spb_word 我的代码最终将标准化的字插入 spb_obj_word - 参考 spb_word ,但是这又失败,有时是因为Postgres内部错误。再次,我认为我的代码是正确的,但它失败,因为在Postgres SQL引擎本身有问题。

After inserting new words (duplicates recognized and marked not to be inserted) into spb_word my code finally inserts normalized words into spb_obj_word - replacing word body with reference to unduplicated entry in spb_word, but this again fails sometimes because of Postgres internal error. Again I think my code is proper but it fail because there is problem in Postgres SQL engine itself.

添加或删除波兰国家字母从 spb_getWord 只保证我是奇怪的Postgres错误 - 所有唯一/重复的注意事项保持不变,但允许/禁止一些字母从字导致错误或消除它们。所以这不是我的代码错误的情况下 - 不合理的处理重复。

Adding or removing polish national letters from generated words by spb_getWord only assures me that it is strange Postgres error - all unique/duplicates considerations remain the same but allowing/disallowing some letters from words leads to errors or eliminates them. So it is not the case of error in my code - unproper handling of duplicates.

第二件事,确保我的代码是不是错误是不可预测的错误时刻。我的代码的每次运行都会执行相同的字序列,所以它应该总是在相同的位置中断相同的值导致错误。但它不是 - 它是相当随机的时刻,它失败。

Second thing that assures me that it is not error in my code is unpredictable moment of errors that are detected. Every run of my code does the same sequence of words so it should always break in the same place with the same value causing the error. But it is not - it is quite random moment that it fails.

推荐答案

我已经设法简化测试代码 - 使用一个表。简化问题已发布在pgsql-bugs邮寄名单上: http://档案.postgresql.org / pgsql-bugs / 2010-01 / msg00182.php

I've managed to simplify test code - now it uses one table. Simplified problem was posted on pgsql-bugs mailing list: http://archives.postgresql.org/pgsql-bugs/2010-01/msg00182.php. It is confirmed to occur on other machines (not only mine).

这是主测试函数的简化版本(它需要一个表 spb_word ,序列 spb_wordnum_seq spb_word_seq 和一个函数 spb_getWord <

Here is this simplified version of main test function (it needs one table spb_word, sequences spb_wordnum_seq and spb_word_seq and one function spb_getWord given in my question).

create or replace function spb_runmeSimple2(cnt int) returns void as $$
declare
  w varchar(410);
  wordId int;
begin
  perform setval('spb_wordnum_seq', 1, false);
  truncate table spb_word cascade;

  for i in 1 .. cnt loop

    if i % 100 = 0 then raise notice 'i = %', i; end if;

    select spb_getWord() into w;
    select id into wordId from spb_word where word = w;
    if wordId is null then 
      insert into spb_word (word) values (w);
    end if;

  end loop;
end;
$$ language plpgsql;

执行时发生错误(但是以不可预测的方式)select spb_runmeSimple2(10000000 )

这里是解决方法:将数据库归类从polish更改为标准'C'。使用C排序规则没有错误。但是没有波兰语整理波兰语单词排序不正确(关于波兰国家字符),所以问题应该在Postgres本身固定。

Here is workaround: change database collation from polish to standard 'C'. With 'C' collation there is no error. But without polish collation polish words are sorted incorrectly (with respect to polish national characters), so problem should be fixed in Postgres itself.

这篇关于为什么这个代码在PostgreSQL中失败,以及如何解决它(解决方法)?是Postgres SQL引擎的缺陷吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆