具有第二列低基数的复合索引是否会影响性能,应该使用它? [英] Will a compound index with a second column of low cardinality effect performance enough that it should be used?
问题描述
我在Rails中使用单表继承(STI)简化为以下内容:
I'm using Single Table Inheritance (STI) in Rails simplified to the following:
class Vehicle
belongs_to :user
end
class Car < Vehicle
end
class Plane < Vehicle
end
车辆中的每条记录
table将类型
列设置为'Car'
或'Plane '
除了 user_id
外键之外。如果添加更多车辆类型,它也可能有其他值,但是,类型
的基数将始终低于 user_id
。就像在现实生活中一样,我希望这个表包含更多的汽车。
Each record in the vehicles
table will have a type
column set to either 'Car'
or 'Plane'
in addition to the user_id
foreign key. It could also have additional values if more vehicle types are added, however, type
will always have a much lower cardinality than user_id
. Just as in real life, I expect this table to contain many more Cars.
上有一个复合索引[:user_id,:type]
(按此顺序)这些记录由他们的子类查找。
There is a compound index on [:user_id, :type]
(in that order) and these records are looked up by their subclasses.
我相信在没有Planes的最坏情况下,将使用索引,因为 user_id
是第一个,第二个部分基本上会被忽略。在这种情况下,单个索引会有一个超小的好处,因为它不会维持复合第二列。
I believe that in the worst case of no Planes, the index will be used since user_id
is first and the second part will essentially be ignored. In this case a single index would have a super small benefit in that it's not maintaining the compound second column.
在存在相等分割的情况下会发生什么?
What happens in the case where there's an equal split?
- 索引是否会将记录减半,从而产生不错的效果?
- 数据库维护一个复合索引(即
user_id
)超过或否定任何节省?
- Will the index cut the records in half and thus have a decent effect?
- Will the cost of the database maintaining a compound index over a single one (i.e. just
user_id
) exceed or negate any savings?
ActiveRecord调用的示例是 Car.where(user_id:10)
生成以下SQL:
An example ActiveRecord call would be Car.where(user_id: 10)
which generates the following SQL:
SELECT `vehicles`.* FROM `vehicles` WHERE `vehicles`.`type` IN ('Car')
AND `vehicles`.`user_id` = 10
推荐答案
使用该索引时,性能改进几乎总是超过维护索引(单列或多列)的成本。对于每个 INSERT
/ DELETE
,它是一个小的增量,如果通过<更改索引字段的值,则加上成本code>更新。 ( UPDATE
的情况很少见。)所以,不要担心维持复合指数的成本。
The cost of maintaining an index (single-column or multi-column) is almost always outweighed by the performance improvement when that index is used. It is a small increment on each INSERT
/DELETE
, plus a cost if changing the value of an indexed field via UPDATE
. (The UPDATE
case is rare.) So, don't worry about the cost of "maintaining a compound index".
WHERE `vehicles`.`type` IN ('Car')
AND `vehicles`.`user_id` = 10
需要 INDEX(user_id,类型)
。
优化器将
- 发现该索引是可能的候选者,
- 检查一些统计数据,然后
- 使用索引,或者判断基数很差,只需扫描表格。
包括索引;不要担心。
我订购的字段不是(type,user_id)
基于你的 IN
,这意味着你有时可能有的多个值
。
I ordered the fields that way, not (type, user_id)
based on your IN
, which implies that you might sometimes have multiple values for type
.
如果表中的所有行都有 type ='Car'
,没问题。我所说的一切仍然适用。包含不必要的类型
的浪费是微不足道的。
If all rows in the table have type = 'Car'
, no problem. Everything I have said still applies. The waste of including the unnecessary type
is insignificant.
最好拥有所有=列(s )首先在索引中,然后在最多一个其他字段中。 此处进一步讨论 。
It is better to have all "=" column(s) first in an index, then at most one other field. Further discussion here.
这篇关于具有第二列低基数的复合索引是否会影响性能,应该使用它?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!