PostgreSQL 可以对数组元素有唯一性约束吗? [英] Can PostgreSQL have a uniqueness constraint on array elements?

查看:55
本文介绍了PostgreSQL 可以对数组元素有唯一性约束吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试为当前在 LDAP 存储中的主机数据提出一个 PostgreSQL 模式.该数据的一部分是一台机器可以拥有的主机名列表,该属性通常是大多数人用来查找主机记录的关键.

I'm trying to come up with a PostgreSQL schema for host data that's currently in an LDAP store. Part of that data is the list of hostnames a machine can have, and that attribute is generally the key that most people use to find the host records.

将这些数据移动到 RDBMS 后,我想摆脱的一件事是能够在主机名列上设置唯一性约束,以便无法分配重复的主机名.如果主机只能有一个名称,这会很容易,但由于它们可以有多个名称,所以会更复杂.

One thing I'd like to get out of moving this data to an RDBMS is the ability to set a uniqueness constraint on the hostname column so that duplicate hostnames can't be assigned. This would be easy if hosts could only have one name, but since they can have more than one it's more complicated.

我意识到完全规范化的方法是使用一个外键指向主机表的主机名表,但我想避免让每个人都需要为最简单的查询进行连接:

I realize that the fully-normalized way to do this would be to have a hostnames table with a foreign key pointing back to the hosts table, but I'd like to avoid having everybody need to do joins for even the simplest query:

select hostnames.name,hosts.*
  from hostnames,hosts
 where hostnames.name = 'foobar'
   and hostnames.host_id = hosts.id;

我认为使用 PostgreSQL 数组可以解决这个问题,而且它们确实使简单的查询变得简单:

I figured using PostgreSQL arrays could work for this, and they certainly make the simple queries simple:

select * from hosts where names @> '{foobar}';

当我对主机名属性设置唯一性约束时,它当然会将整个名称列表视为唯一值,而不是每个名称.有没有办法让每个名称在每一行中都是唯一的?

When I set a uniqueness constraint on the hostnames attribute, though, it of course treats the entire list of names as the unique value instead of each name. Is there a way to make each name unique across every row instead?

如果没有,有人知道另一种更有意义的数据建模方法吗?

If not, does anyone know of another data-modeling approach that would make more sense?

推荐答案

正道

您可能需要重新考虑规范化您的架构.每个人都没有必要加入最简单的查询".创建一个VIEW为此.

The righteous path

You might want to reconsider normalizing your schema. It is not necessary for everyone to "join for even the simplest query". Create a VIEW for that.

表格可能如下所示:

CREATE TABLE hostname (
  hostname_id serial PRIMARY KEY
, host_id     int  REFERENCES host(host_id) ON UPDATE CASCADE ON DELETE CASCADE
, hostname    text UNIQUE
);

代理主键hostname_id可选.我更喜欢有一个.在您的情况下 hostname 可能是主键.但是,使用简单的小 integer 键可以更快地进行许多操作.创建外键约束以链接到表 host.
创建这样的视图:

The surrogate primary key hostname_id is optional. I prefer to have one. In your case hostname could be the primary key. But many operations are faster with a simple, small integer key. Create a foreign key constraint to link to the table host.
Create a view like this:

CREATE VIEW v_host AS
SELECT h.*
     , array_agg(hn.hostname) AS hostnames
--   , string_agg(hn.hostname, ', ') AS hostnames  -- text instead of array
FROM   host h
JOIN   hostname hn USING (host_id)
GROUP  BY h.host_id;   -- works in v9.1+

从pg 9.1开始,GROUP BY中的主键覆盖SELECT中该表的所有列代码>列表.9.1 版发行说明:

Starting with pg 9.1, the primary key in the GROUP BY covers all columns of that table in the SELECT list. The release notes for version 9.1:

允许查询目标列表中的非GROUP BY列键在 GROUP BY 子句

Allow non-GROUP BY columns in the query target list when the primary key is specified in the GROUP BY clause

查询可以像使用表格一样使用视图.通过这种方式搜索主机名将快得多:

Queries can use the view like a table. Searching for a hostname will be much faster this way:

SELECT *
FROM   host h
JOIN   hostname hn USING (host_id)
WHERE  hn.hostname = 'foobar';

假设您在 host(host_id) 上有一个索引,应该是这种情况,因为它应该是主键.此外,hostname(hostname) 上的 UNIQUE 约束会自动实现其他所需的索引.

Provided you have an index on host(host_id), which should be the case as it should be the primary key. Plus, the UNIQUE constraint on hostname(hostname) implements the other needed index automatically.

在 Postgres 9.2+ 中,如果您可以获得 仅索引扫描:

In Postgres 9.2+ a multicolumn index would be even better if you can get an index-only scan out of it:

CREATE INDEX hn_multi_idx ON hostname (hostname, host_id);

从 Postgres 9.3 开始,您可以使用 实体化视图,在情况允许的情况下.尤其是如果你阅读的次数比写表的次数多得多.

Starting with Postgres 9.3, you could use a MATERIALIZED VIEW, circumstances permitting. Especially if you read much more often than you write to the table.

如果我不能说服你走正道,我也会帮助黑暗面.我很灵活.:)

If I can't convince you of the righteous path, I'll assist on the dark side, too. I am flexible. :)

这是一个如何强制主机名唯一性的演示.我使用表 hostname 来收集主机名和表 host 上的触发器以使其保持最新.唯一违规会引发异常并中止操作.

Here is a demo how to enforce uniqueness of hostnames. I use a table hostname to collect hostnames and a trigger on the table host to keep it up to date. Unique violations raise an exception and abort the operation.

CREATE TABLE host(hostnames text[]);
CREATE TABLE hostname(hostname text PRIMARY KEY);  --  pk enforces uniqueness

触发功能:

CREATE OR REPLACE FUNCTION trg_host_insupdelbef()
  RETURNS trigger AS
$func$
BEGIN
-- split UPDATE into DELETE & INSERT
IF TG_OP = 'UPDATE' THEN
   IF OLD.hostnames IS DISTINCT FROM NEW.hostnames THEN  -- keep going
   ELSE RETURN NEW;  -- exit, nothing to do
   END IF;
END IF;

IF TG_OP IN ('DELETE', 'UPDATE') THEN
   DELETE FROM hostname h
   USING  unnest(OLD.hostnames) d(x)
   WHERE  h.hostname = d.x;

   IF TG_OP = 'DELETE' THEN RETURN OLD;  -- exit, we are done
   END IF;
END IF;

-- control only reaches here for INSERT or UPDATE (with actual changes)
INSERT INTO hostname(hostname)
SELECT h
FROM   unnest(NEW.hostnames) h;

RETURN NEW;
END
$func$ LANGUAGE plpgsql;

触发:

CREATE TRIGGER host_insupdelbef
BEFORE INSERT OR DELETE OR UPDATE OF hostnames ON host
FOR EACH ROW EXECUTE PROCEDURE trg_host_insupdelbef();

SQL Fiddle 测试运行.

在数组列 host.hostnames<上使用 GIN 索引强>数组运算符来使用它:

Use a GIN index on the array column host.hostnames and array operators to work with it:

这篇关于PostgreSQL 可以对数组元素有唯一性约束吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆