我的表是否需要一个主键,它有一个 UNIQUE(复合 4 列),其中之一可以是 NULL? [英] Do I need a primary key for my table, which has a UNIQUE (composite 4-columns), one of which can be NULL?

查看:14
本文介绍了我的表是否需要一个主键,它有一个 UNIQUE(复合 4 列),其中之一可以是 NULL?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有下表(PostgreSQL 8.3),其中存储了一些产品的价格.价格与另一个数据库同步,基本上下面的大多数字段(除了一个)不是由我们的客户更新 - 而是每隔一段时间删除和刷新以与另一个股票数据库同步:

I have the following table (PostgreSQL 8.3) which stores prices of some products. The prices are synchronised with another database, basically most of the fields below (apart from one) are not updated by our client - but instead dropped and refreshed every once-in-a-while to sync with another stock database:

CREATE TABLE product_pricebands (
    template_sku varchar(20) NOT NULL,
    colourid integer REFERENCES colour (colourid) ON DELETE CASCADE,        
    currencyid integer NOT NULL REFERENCES currency (currencyid) ON DELETE CASCADE,
    siteid integer NOT NULL REFERENCES site (siteid) ON DELETE CASCADE,

    master_price numeric(10,2),

    my_custom_field boolean, 

    UNIQUE (template_sku, siteid, currencyid, colourid)
);

在同步时,我基本上删除了上面的大部分数据,除了 my_custom_field 为 TRUE 的数据(如果为 TRUE,则表示客户端通过其 CMS 更新了此字段,因此不应删除此记录).然后我向表中插入 100 到 1000 行,并在插入失败的地方更新(即 (template_sku、siteid、currencyid、colourid) 的组合已经存在).

On the synchronisation, I basically DELETE most of the data above except for data WHERE my_custom_field is TRUE (if it's TRUE, it means the client updated this field via their CMS and therefore this record should not be dropped). I then INSERT 100s to 1000s of rows into the table, and UPDATE where the INSERT fails (i.e. where the combination of (template_sku, siteid, currencyid, colourid) already exists).

我的问题是 - 在这里应该应用什么最佳实践来创建主键?甚至需要主键吗?我想让主键 = (template_sku, siteid,currencyid, colourid) - 但 colourid 字段可以为 NULL,并且在复合主键中使用它是不可能的.

My question is - what best practice should be applied here to create a primary key? Is a primary key even needed? I wanted to make the primary key = (template_sku, siteid, currencyid, colourid) - but the colourid field can be NULL, and using it in a composite primary key is not possible.

从我在其他论坛帖子中读到的内容,我认为我已经正确地完成了上述操作,只需要澄清:

From what I read on other forum posts, I think I have done the above correctly, and just need to clarify:

1) 我应该使用串行"主键以防万一我需要它吗?目前我不会,也不认为我会,因为表中的重要数据是价格和我的自定义字段,仅由 (template_sku, siteid, currencyid, colourid) 组合标识.

1) Should I use a "serial" primary key just in case I ever need one? At the moment I don't, and don't think I ever will, because the important data in the table is the price and my custom field, only identified by the (template_sku, siteid, currencyid, colourid) combination.

2) 由于 (template_sku, siteid,currencyid, colourid) 是我用来查询产品价格的组合,我是否应该向我的列添加任何进一步的索引,例如template_sku",它是一个 varchar?或者 UNIQUE 约束是否已经是我的 SELECT 的一个很好的索引?

2) Since (template_sku, siteid, currencyid, colourid) is the combination that I will use to query a product's price, should I add any further indexing to my columns, such as the "template_sku" which is a varchar? Or is the UNIQUE constraint a good index already for my SELECTs?

推荐答案

我应该使用串行"主键以备不时之需吗?

Should I use a "serial" primary key just in case I ever need one?

如果您需要,您可以在以后轻松添加一个序列列:

You can easily add a serial column later if you need one:

ALTER TABLE product_pricebands ADD COLUMN id serial;

该列将自动填充唯一值.您甚至可以在同一语句中将其设为主键(如果尚未定义主键):

The column will be filled with unique values automatically. You can even make it the primary key in the same statement (if no primary key is defined, yet):

ALTER TABLE product_pricebands ADD COLUMN id serial PRIMARY KEY;

如果您从其他表中引用该表,我建议您使用这样的代理主键,因为通过四列进行链接相当笨拙.在带有 JOIN 的 SELECT 中它也更慢.

If you reference the table from other tables I would advise to use such a surrogate primary key, because it is rather unwieldy to link by four columns. It is also slower in SELECTs with JOINs.

无论哪种方式,您都应该定义一个主键.包含可为空列的 UNIQUE 索引不是完全替换.它允许包含 NULL 值的组合重复,因为从不认为两个 NULL 值相同.这可能会导致麻烦.

Either way, you should define a primary key. The UNIQUE index including a nullable column is not a full replacement. It allows duplicates for combinations including a NULL value, because two NULL values are never considered the same. This can lead to trouble.

作为

colourid 字段可以为 NULL

the colourid field can be NULL

您可能想要创建两个唯一索引.组合 (template_sku, siteid,currencyid, colourid) 不能是 PRIMARY KEY,因为 colourid 可以为空,但你可以创建一个 UNIQUE 像您已经拥有的约束(自动实现索引):

you might want to create two unique indexes. The combination (template_sku, siteid, currencyid, colourid) cannot be a PRIMARY KEY, because of the nullable colourid, but you can create a UNIQUE constraint like you already have (implementing an index automatically):

ALTER TABLE product_pricebands ADD CONSTRAINT product_pricebands_uni_idx
UNIQUE (template_sku, siteid, currencyid, colourid)

该索引完美地涵盖了您在 2 中提到的查询).
如果您想避免 (colourid IS NULL) 的重复",请另外创建一个部分唯一索引:

This index perfectly covers the queries you mention in 2).
Create a partial unique index in addition if you want to avoid "duplicates" with (colourid IS NULL):

CREATE UNIQUE INDEX product_pricebands_uni_null_idx
ON product_pricebands (template_sku, siteid, currencyid)
WHERE colourid IS NULL;

覆盖所有基地.我在 有关 dba.SE 的相关答案 中详细介绍了该技术.

To cover all bases. I wrote more about that technique in a related answer on dba.SE.

上面的简单替代方法是使 colourid NOT NULL 并创建一个主键而不是上面的 product_pricebands_uni_idx.

The simple alternative to the above is to make colourid NOT NULL and create a primary key instead of the above product_pricebands_uni_idx.

也和你一样

基本上删除大部分数据

对于您的重新填充操作,删除在重新填充操作期间不需要的索引并在之后重新创建索引会更快.从头开始构建索引比增量添加所有行要快一个数量级.

for your refill operation, it will be faster to drop indexes, that are not needed during the refill operation, and recreate those afterwards. It is faster by an order of magnitude to build an index from scratch than to add all rows incrementally.

您如何知道使用(需要)哪些索引?

How do you know, which indexes are used (needed)?

  • 使用 EXPLAIN ANALYZE 测试您的查询.
  • 或者使用内置统计数据.pgAdmin 在单独的选项卡中显示所选对象的统计信息.
  • Test your queries with EXPLAIN ANALYZE.
  • Or use the built-in statistics. pgAdmin displays statistics in a separate tab for the selected object.

选择带有 my_custom_field = TRUE 的几行到临时表中可能会更快,TRUNCATE 基表并重新插入幸存者.取决于您是否定义了外键.看起来像这样:

It may also be faster to select the few rows with my_custom_field = TRUE into a temporary table, TRUNCATE the base table and re-INSERT the survivors. Depends on whether you have foreign keys defined. Would look like this:

CREATE TEMP TABLE pr_tmp AS
SELECT * FROM product_pricebands WHERE my_custom_field;

TRUNCATE product_pricebands;
INSERT INTO product_pricebands SELECT * FROM pr_tmp;

这避免了大量吸尘.

这篇关于我的表是否需要一个主键,它有一个 UNIQUE(复合 4 列),其中之一可以是 NULL?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆