PostgreSQL具有唯一约束的多个可空列 [英] PostgreSQL multiple nullable columns in unique constraint

查看:197
本文介绍了PostgreSQL具有唯一约束的多个可空列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一个遗留数据库模式,有一些有趣的设计决策。直到最近,我们只支持Oracle和SQL Server,但是我们试图添加对PostgreSQL的支持,这引起了一个有趣的问题。我搜索了Stack Overflow和互联网的其余部分,我不相信这种特殊情况是重复的。



Oracle和SQL Server的行为一样到一个唯一的约束中的可空的列,这是基本上忽略执行唯一检查时为NULL的列。



假设我有以下表格和约束: / p>

  CREATE TABLE EXAMPLE 

ID TEXT NOT NULL PRIMARY KEY,
FIELD1 TEXT NULL,
FIELD2 TEXT NULL,
FIELD3 TEXT NULL,
FIELD4 TEXT NULL,
FIELD5 TEXT NULL,
...
);

CREATE UNIQUE INDEX EXAMPLE_INDEX ON EXAMPLE

FIELD1 ASC,
FIELD2 ASC,
FIELD3 ASC,
FIELD4 ASC,
FIELD5 ASC
);

在Oracle和SQL Server上,留下任何可空的列 NULL 将导致只对非空列执行唯一性检查。因此,以下插入只能执行一次:

  INSERT INTO EXAMPLE VALUES('1','FIELD1_DATA',NULL,NULL ,NULL,NULL); 
INSERT INTO EXAMPLE VALUES('2','FIELD1_DATA','FIELD2_DATA',NULL,NULL,'FIELD5_DATA');
- 当它们违反唯一约束时,这些将成功:
INSERT INTO EXAMPLE VALUES('3','FIELD1_DATA',NULL,NULL,NULL,NULL);
INSERT INTO EXAMPLE VALUES('4','FIELD1_DATA','FIELD2_DATA',NULL,NULL,'FIELD5_DATA');

但是,由于PostgreSQL(正确)遵守SQL标准,因此这些插入值只要其中一个为NULL)将不会抛出错误并正确插入没有问题。不幸的是,由于我们的遗留架构和支持代码,我们需要PostgreSQL的行为与SQL Server和Oracle相同。



我知道以下堆栈溢出问题其答案为:创建具有空列的唯一约束。从我的理解,有两个策略来解决这个问题:


  1. 创建部分索引,描述在可空的列都是 NULL NOT NULL (导致部分索引数量呈指数增长)

  2. 在索引中的可空列上使用 COAELSCE ,其中可以为空的列。



(1)的问题在于,我们需要创建的部分索引的数量随着我们要添加到约束中的每个额外的可空值列而呈指数增长(如果我没有被误认为是2 ^ N)。 (2)的问题是前哨值减少了该列的可用值数量以及所有潜在的性能问题。



我的问题是:这两个是唯一的解决这个问题?如果是这样,那么他们之间在这种特殊用途上有什么折衷?一个好的答案将讨论每个解决方案的性能,可维护性,PostgreSQL如何在简单的 SELECT 语句中使用这些索引,以及任何其他getchas或需要注意的事项。请记住,5个可空的列仅作为示例;我们在我们的架构中有几个表,最多可以有10个(是的,我每次看到它都会哭,但是它是什么)。

解决方案

您正在使用现有的 Oracle SQL Server 实现兼容性

这是一个比较三个RDBS的物理行存储格式的演示



由于Oracle并没有在行存储中实现 NULL 值,所以不能区分空字符串和 NULL 反正。所以在Postgres中使用空字符串('')而不是 NULL 值是谨慎的,对于 特定用例?



将唯一约束中包含的列定义为 NOT NULL DEFAULT'',解决问题:

  CREATE TABLE示例(
example_id serial PRIMARY KEY
,field1 text NOT NULL DEFAULT''
,field2 text NOT NULL DEFAULT''
,field3 text NOT NULL DEFAULT''
,field4 text NOT NULL DEFAULT''
,field5 text NOT NULL DEFAULT''
,CONSTRAINT example_index UNIQUE(field1,field2,field3,field4,field5)
);



注意




  • 您在此问题中展示的是唯一的索引

      CREATE UNIQUE INDEX ... 

    不是唯一的约束 你一直在谈论。有微妙的重要区别!





    我将其更改为实际约束,就像您将其作为帖子的主题。


  • 关键字 ASC 只是噪音,因为这是默认排序顺序。我离开了。


  • 使用 系列 为了简单起见,PK列是完全可选的,但通常比存储为 text 的数字更好。




使用它



只需从<$ c $中省略空/ c> INSERT :

  INSERT INTO示例(field1)VALUES('F1_DATA'); 
INSERT INTO例子(field1,field2,field5)VALUES('F1_DATA','F2_DATA','F5_DATA');

重复任何这些插入将违反唯一约束。



如果您坚持省略目标列(在持久化的 INSERT 语句中有一点反模式):

用于需要列出所有列的批量插入:

  INSERT INTO example VALUES 
('1','F1_DATA',DEFAULT,DEFAULT,DEFAULT,DEFAULT)
,('2','F1_DATA','F2_DATA',DEFAULT,DEFAULT,'F5_DATA');

简单地:

  INSERT INTO示例VALUES 
('1','F1_DATA','','','','')
,('2 ','F1_DATA','F2_DATA','','','F5_DATA');

或者您可以在插入或更新之前编写触发器 NULL 转换为''



替代方案



如果您需要使用实际的空值,建议使用 COALESCE / code> ,如您所提及的选项(2)和 @wildplasser提供的最后一个例子。



数组的索引,如 @鲁道夫提出的很简单,但费用相当昂贵。数组处理在Postgres中并不是很便宜,并且有一个与行(24字节)类似的数组开销:





数组仅限于相同数据类型的列。您可以将所有列转换为 text ,如果某些不是,但通常会进一步增加存储要求。或者您可以使用众所周知的行类型进行异构数据类型...



一个角色情况:具有所有NULL值的数组(或行)类型被视为相等( !),所以只能有1行所有相关的列为NULL。可能或可能不是所期望的。如果你想禁止所有的列NULL:




We have a legacy database schema that has some interesting design decisions. Until recently, we have only supported Oracle and SQL Server, but we are trying to add support for PostgreSQL, which has brought up an interesting problem. I have searched Stack Overflow and the rest of the internet and I don't believe this particular situation is a duplicate.

Oracle and SQL Server both behave the same when it comes to nullable columns in a unique constraint, which is to essentially ignore the columns that are NULL when performing the unique check.

Let's say I have the following table and constraint:

CREATE TABLE EXAMPLE
(
    ID TEXT NOT NULL PRIMARY KEY,
    FIELD1 TEXT NULL,
    FIELD2 TEXT NULL,
    FIELD3 TEXT NULL,
    FIELD4 TEXT NULL,
    FIELD5 TEXT NULL,
    ...
);

CREATE UNIQUE INDEX EXAMPLE_INDEX ON EXAMPLE
(
    FIELD1 ASC,
    FIELD2 ASC,
    FIELD3 ASC,
    FIELD4 ASC,
    FIELD5 ASC
);

On both Oracle and SQL Server, leaving any of the nullable columns NULL will result in only performing a uniqueness check on the non-null columns. So the following inserts can only be done once:

INSERT INTO EXAMPLE VALUES ('1','FIELD1_DATA', NULL, NULL, NULL, NULL );
INSERT INTO EXAMPLE VALUES ('2','FIELD1_DATA','FIELD2_DATA', NULL, NULL,'FIELD5_DATA');
-- These will succeed when they should violate the unique constraint:
INSERT INTO EXAMPLE VALUES ('3','FIELD1_DATA', NULL, NULL, NULL, NULL );
INSERT INTO EXAMPLE VALUES ('4','FIELD1_DATA','FIELD2_DATA', NULL, NULL,'FIELD5_DATA');

However, because PostgreSQL (correctly) adheres to the SQL Standard, those insertions (and any other combination of values as long as one of them is NULL) will not throw an error and be inserted correctly no problem. Unfortunately, because of our legacy schema and the supporting code, we need PostgreSQL to behave the same as SQL Server and Oracle.

I am aware of the following Stack Overflow question and its answers: Create unique constraint with null columns. From my understanding, there are two strategies to solve this problem:

  1. Create partial indexes that describe the index in cases where the nullable columns are both NULL and NOT NULL (which results in exponential growth of the number of partial indexes)
  2. Use COAELSCE with a sentinel value on the nullable columns in the index.

The problem with (1) is that the number of partial indexes we'd need to create grows exponentially with each additional nullable column we'd like to add to the constraint (2^N if I am not mistaken). The problems with (2) are that sentinel values reduces the number of available values for that column and all of the potential performance problems.

My question: are these the only two solutions to this problem? If so, what are the tradeoffs between them for this particular use case? A good answer would discuss the performance of each solution, the maintainability, how PostgreSQL would utilize these indexes in simple SELECT statements, and any other "gotchas" or things to be aware of. Keep in mind that 5 nullable columns was only for an example; we have some tables in our schema with up to 10 (yes, I cry every time I see it, but it is what it is).

解决方案

You are striving for compatibility with your existing Oracle and SQL Server implementations.
Here is a presentation comparing physical row storage formats of the three involved RDBS.

Since Oracle does not implement NULL values at all in row storage, it can't tell the difference between an empty string and NULL anyway. So wouldn't it be prudent to use empty strings ('') instead of NULL values in Postgres as well - for this particular use case?

Define columns included in the unique constraint as NOT NULL DEFAULT '', problem solved:

CREATE TABLE example (
   example_id serial PRIMARY KEY
 , field1 text NOT NULL DEFAULT ''
 , field2 text NOT NULL DEFAULT ''
 , field3 text NOT NULL DEFAULT ''
 , field4 text NOT NULL DEFAULT ''
 , field5 text NOT NULL DEFAULT ''
 , CONSTRAINT example_index UNIQUE (field1, field2, field3, field4, field5)
);

Notes

  • What you demonstrate in the question is a unique index:

    CREATE UNIQUE INDEX ...
    

    not the unique constraint you keep talking about. There are subtle, important differences!

    I changed that to an actual constraint like you made it the subject of the post.

  • The keyword ASC is just noise, since that is the default sort order. I left it away.

  • Using a serial PK column for simplicity which is totally optional but typically better than numbers stored as text.

Working with it

Just omit empty / null fields from the INSERT:

INSERT INTO example(field1) VALUES ('F1_DATA');
INSERT INTO example(field1, field2, field5) VALUES ('F1_DATA', 'F2_DATA', 'F5_DATA');

Repeating any of theses inserts would violate the unique constraint.

Or if you insist on omitting target columns (which is a bit of antipattern in persisted INSERT statements):
Or for bulk inserts where all columns need to be listed:

INSERT INTO example VALUES
  ('1', 'F1_DATA', DEFAULT, DEFAULT, DEFAULT, DEFAULT)
, ('2', 'F1_DATA','F2_DATA', DEFAULT, DEFAULT,'F5_DATA');

Or simply:

INSERT INTO example VALUES
  ('1', 'F1_DATA', '', '', '', '')
, ('2', 'F1_DATA','F2_DATA', '', '','F5_DATA');

Or you can write a trigger BEFORE INSERT OR UPDATE that converts NULL to ''.

Alternative solutions

If you need to use actual NULL values I would suggest the unique index with COALESCE like you mentioned as option (2) and @wildplasser provided as his last example.

The index on an array like @Rudolfo presented is simple, but considerably more expensive. Array handling isn't very cheap in Postgres and there is an array overhead similar to that of a row (24 bytes):

Arrays are limited to columns of the same data type. You could cast all columns to text if some are not, but it will typically further increase storage requirements. Or you could use a well-known row type for heterogeneous data types ...

A corner case: array (or row) types with all NULL values are considered equal (!), so there can only be 1 row with all involved columns NULL. May or may not be as desired. If you want to disallow all columns NULL:

这篇关于PostgreSQL具有唯一约束的多个可空列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆