我需要一个主键,我的表有一个UNIQUE(复合4列),其中一个可以是NULL吗? [英] Do I need a primary key for my table, which has a UNIQUE (composite 4-columns), one of which can be NULL?

查看:284
本文介绍了我需要一个主键,我的表有一个UNIQUE(复合4列),其中一个可以是NULL吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个表(PostgreSQL 8.3),它存储一些产品的价格。价格与另一个数据库同步,基本上大多数下面的字段(除了一个)不会被我们的客户端更新,而是每隔一段时间同时删除和刷新,以与另一个库存数据库进行同步:

  CREATE TABLE product_pricebands(
template_sku varchar(20)NOT NULL,
colourid integer REFERENCES color(colourid)ON DELETE CASCADE ,
currencyid integer NOT NULL参考货币(currencyid)ON DELETE CASCADE,
siteid integer NOT NULL参考网站(siteid)ON DELETE CASCADE,

master_price数字(10,2) ,

my_custom_field boolean,

UNIQUE(template_sku,siteid,currencyid,colourid)
);

在同步时,我基本上删除了大部分上述数据,除了数据WHERE my_custom_field为TRUE(if这是真的,这意味着客户端通过CMS更新了该字段,因此该记录不应该被删除)。然后我将100到1000行的INSERT插入到表中,以及INSERT失败的UPDATE(即(template_sku,siteid,currencyid,colourid)的组合已经存在)。



我的问题是 - 应该在这里应用最佳实践来创建一个主键?是主要关键还是需要吗?我想使主键=(template_sku,siteid,currencyid,colourid) - 但是colourid字段可以为NULL,并且在复合主键中使用它是不可能的。



从我在其他论坛帖子中阅读的内容,我想我已经做了上面的正确,只需要澄清:



)我应该使用serial主键,以防万一需要一个?目前我没有,不要以为我会,因为表中的重要数据是价格和我的自定义字段,只能由(template_sku,siteid,currencyid,colourid)组合确定。



2)由于(template_sku,siteid,currencyid,colourid)是我将用于查询产品价格的组合,我应该向我的列添加任何进一步的索引,例如template_sku是一个varchar?或者是UNIQUE约束是我的SELECT的一个很好的索引?

解决方案


我应该使用系列主键,以防我需要一个?


如果需要,可以稍后添加一个串行列:

  ALTER TABLE product_pricebands ADD COLUMN id serial; 

列将自动填充唯一值。您甚至可以将其作为同一语句中的主键(如果没有定义主键):

  ALTER TABLE product_pricebands ADD COLUMN id serial PRIMARY KEY; 

如果从其他表引用表,我建议使用这样一个代理主键,因为四列链接不方便。在使用JOIN的SELECT中也是较慢的。



无论哪种方式,您应该定义一个主键。包括可空列的UNIQUE索引不是完全替换。它允许重复的组合包括NULL值,因为两个NULL值从不被认为是相同的。这可能会导致麻烦。






As


colourid字段可以为NULL


您可能需要创建两个唯一索引(template_sku,siteid,currencyid,colourid)不能为 PRIMARY KEY ,因为可空的 colourid ,但您可以创建一个 UNIQUE 约束,就像您已经拥有(自动实现一个索引):

  ALTER TABLE product_pricebands ADD CONSTRAINT product_pricebands_uni_idx 
UNIQUE(template_sku,siteid,currencyid,colourid)
pre>

此索引完全涵盖了您在2)中提及的查询。

如果要避免重复,请另外创建一个部分唯一索引与(colourid IS NULL)

  CREATE UNIQUE INDEX product_pricebands_uni_null_idx 
ON product_pricebands(template_sku,siteid,currencyid)
WHERE colourid IS NULL;

覆盖所有基础。我在 dba.SE中的相关答案中写了更多关于该技术的信息。






上面简单的替代方法是使 colourid NOT NULL,并创建一个主键,而不是以上 product_pricebands_uni_idx






另外,像


基本上删除大部分数据


操作时,在重新填充操作期间放置索引(不需要)将更快,然后重新创建。从头开始构建一个索引的速度要比增加所有行增加一个数量级。



你如何知道哪些索引被使用(需要)? / p>


  • 使用 EXPLAIN ANALYZE 测试您的查询。

  • 或使用内置统计资料 pgAdmin 在所选对象的单独选项卡中显示统计信息。



my_custom_field = TRUE 选择到临时表中的几行也可能更快, TRUNCATE 基准表,插入幸存者。取决于您是否定义了外键。看起来像这样:

  CREATE TEMP TABLE pr_tmp AS 
SELECT * FROM product_pricebands WHERE my_custom_field;

TRUNCATE product_pricebands;
INSERT INTO product_pricebands SELECT * FROM pr_tmp;

这避免了很多吸尘。


I have the following table (PostgreSQL 8.3) which stores prices of some products. The prices are synchronised with another database, basically most of the fields below (apart from one) are not updated by our client - but instead dropped and refreshed every once-in-a-while to sync with another stock database:

CREATE TABLE product_pricebands (
    template_sku varchar(20) NOT NULL,
    colourid integer REFERENCES colour (colourid) ON DELETE CASCADE,        
    currencyid integer NOT NULL REFERENCES currency (currencyid) ON DELETE CASCADE,
    siteid integer NOT NULL REFERENCES site (siteid) ON DELETE CASCADE,

    master_price numeric(10,2),

    my_custom_field boolean, 

    UNIQUE (template_sku, siteid, currencyid, colourid)
);

On the synchronisation, I basically DELETE most of the data above except for data WHERE my_custom_field is TRUE (if it's TRUE, it means the client updated this field via their CMS and therefore this record should not be dropped). I then INSERT 100s to 1000s of rows into the table, and UPDATE where the INSERT fails (i.e. where the combination of (template_sku, siteid, currencyid, colourid) already exists).

My question is - what best practice should be applied here to create a primary key? Is a primary key even needed? I wanted to make the primary key = (template_sku, siteid, currencyid, colourid) - but the colourid field can be NULL, and using it in a composite primary key is not possible.

From what I read on other forum posts, I think I have done the above correctly, and just need to clarify:

1) Should I use a "serial" primary key just in case I ever need one? At the moment I don't, and don't think I ever will, because the important data in the table is the price and my custom field, only identified by the (template_sku, siteid, currencyid, colourid) combination.

2) Since (template_sku, siteid, currencyid, colourid) is the combination that I will use to query a product's price, should I add any further indexing to my columns, such as the "template_sku" which is a varchar? Or is the UNIQUE constraint a good index already for my SELECTs?

解决方案

Should I use a "serial" primary key just in case I ever need one?

You can easily add a serial column later if you need one:

ALTER TABLE product_pricebands ADD COLUMN id serial;

The column will be filled with unique values automatically. You can even make it the primary key in the same statement (if no primary key is defined, yet):

ALTER TABLE product_pricebands ADD COLUMN id serial PRIMARY KEY;

If you reference the table from other tables I would advise to use such a surrogate primary key, because it is rather unwieldy to link by four columns. It is also slower in SELECTs with JOINs.

Either way, you should define a primary key. The UNIQUE index including a nullable column is not a full replacement. It allows duplicates for combinations including a NULL value, because two NULL values are never considered the same. This can lead to trouble.


As

the colourid field can be NULL

you might want to create two unique indexes. The combination (template_sku, siteid, currencyid, colourid) cannot be a PRIMARY KEY, because of the nullable colourid, but you can create a UNIQUE constraint like you already have (implementing an index automatically):

ALTER TABLE product_pricebands ADD CONSTRAINT product_pricebands_uni_idx
UNIQUE (template_sku, siteid, currencyid, colourid)

This index perfectly covers the queries you mention in 2).
Create a partial unique index in addition if you want to avoid "duplicates" with (colourid IS NULL):

CREATE UNIQUE INDEX product_pricebands_uni_null_idx
ON product_pricebands (template_sku, siteid, currencyid)
WHERE colourid IS NULL;

To cover all bases. I wrote more about that technique in a related answer on dba.SE.


The simple alternative to the above is to make colourid NOT NULL and create a primary key instead of the above product_pricebands_uni_idx.


Also, as you

basically DELETE most of the data

for your refill operation, it will be faster to drop indexes, that are not needed during the refill operation, and recreate those afterwards. It is faster by an order of magnitude to build an index from scratch than to add all rows incrementally.

How do you know, which indexes are used (needed)?

  • Test your queries with EXPLAIN ANALYZE.
  • Or use the built-in statistics. pgAdmin displays statistics in a separate tab for the selected object.

It may also be faster to select the few rows with my_custom_field = TRUE into a temporary table, TRUNCATE the base table and re-INSERT the survivors. Depends on whether you have foreign keys defined. Would look like this:

CREATE TEMP TABLE pr_tmp AS
SELECT * FROM product_pricebands WHERE my_custom_field;

TRUNCATE product_pricebands;
INSERT INTO product_pricebands SELECT * FROM pr_tmp;

This avoids a lot of vacuuming.

这篇关于我需要一个主键,我的表有一个UNIQUE(复合4列),其中一个可以是NULL吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆