索引具有重复值的列 [英] Indexing a column having duplicate values

查看:77
本文介绍了索引具有重复值的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑这些表格:

seed (seedid, seedname) # PK-(seedid) 
stock (seedid, stk, storeid)  # PK-(seedid,storeid), FK-(storeid)
#InnoDB

stock: 
      seedid, stk, storeid
         1    12     81
         2    13     81
         3    14     81

         1    12     82
         2    11     82
         3    13     82
         4    12     82

Query -> select stk from stock where seedid = 'aaa' and storeid = 'yyy'.

stock代表几家店的库存,因此storeid会重复.

Table stock represent stocks of several stores, hence storeid will be repeated over.

我如何索引表 stock,因为它会经常使用 storeid 进行查询?

How do I index, table stock, given that it will be queried frequently using storeid ?

自动索引的主键,并且由于 (seedid,storeid) 已经是主键,是否不需要进一步索引?

Primary keys as automatically indexed, and since (seedid,storeid) is already a Primary key, is there no need to index it further?

推荐答案

基于不完整的规范,我会这样做:

Based on the incomplete specification, I would do this:

   CREATE UNIQUE INDEX stock_UX1 ON stock (storeid,seedid,stk)

此索引将满足以storeid 作为前导列的索引的要求.(如果这是 InnoDB 并且 storeid 是外键,我们知道会有这个要求.)

This index would satisfy the requirement for an index with storeid as the leading column. (And we know will have that requirement if this is InnoDB and storeid is a foreign key.)

对于如此短的表格行,我会继续将其设为覆盖索引,并包括所有列.然后可以直接从索引页满足查询,而无需查找底层表中的数据页.

With such a short table row, I'd go ahead and make it a covering index, and include all of the columns. Then queries can be satisfied directly from the index pages without lookups to data pages in the underlying table.

既然我们知道 (seedid,storeid) 是唯一的(作为 PRIMARY KEY 给出),我们知道 (storeid,seedid) 也是唯一的,所以我们可能并将索引声明为 UNIQUE.

Since we know that (seedid,storeid) is unique (given as the PRIMARY KEY), we know (storeid,seedid) is also unique, so we might as well declare the index to be UNIQUE.

还有其他选择;我们不必在上面创建该索引.我们可以这样做:

There are other choices; we don't have to create that index above. We could just do this instead:

   CREATE INDEX stock_IX2 ON stock (storeid)

但这将使用几乎相同的空间量,并且不会对尽可能多的查询有利.

But that will use nearly the same amount of space, and not be as beneficial to as many possible queries.

二级索引将包含表的主键;这样第二个索引将包括 seedid 列,给定表的 PRIMARY KEY.即索引等价于:

The secondary index will contain the primary key of the table; so that second index will include the seedid column, given the PRIMARY KEY of the table. That is, the index is equivalent to this:

   CREATE INDEX stock_IX3 ON stock (storeid,seedid)

而且我们知道这两列的组合是唯一的,所以我们可以包含 UNIQUE 关键字

And we know the combination of those two columns is unique, so we can include the UNIQUE keyword

   CREATE UNIQUE INDEX stock_UX4 ON stock (storeid,seedid)

如果我们对表单的查询做一个解释

If we do an EXPLAIN on a query of the form

  EXPLAIN
  SELECT t.storeid
       , t.seedid
       , t.stk 
    FROM stock t 
   WHERE t.storeid = 'foo'

我们很可能会看到对二级索引的范围扫描操作;但是检索 stk 列的值将需要查找基础表中的数据页.在二级索引中包含 stk 列将使索引成为查询的覆盖索引.使用答案中首先推荐的索引,我们希望 EXPLAIN 输出显示使用索引".

we are likely to see a range scan operation on the secondary index; but retrieving the value of stk column will require lookup to the data pages in the underlying table. Including the stk column in the secondary index will make the index a covering index for the query. With the index first recommended in the answer, we expect the EXPLAIN output to show "Using index".

这篇关于索引具有重复值的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆