具有第二列低基数的复合索引是否会影响性能,应该使用它? [英] Will a compound index with a second column of low cardinality effect performance enough that it should be used?

查看:129
本文介绍了具有第二列低基数的复合索引是否会影响性能,应该使用它?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Rails中使用单表继承(STI)简化为以下内容:

I'm using Single Table Inheritance (STI) in Rails simplified to the following:

class Vehicle
  belongs_to :user
end

class Car < Vehicle
end

class Plane < Vehicle
end

车辆中的每条记录 table将类型列设置为'Car''Plane '除了 user_id 外键之外。如果添加更多车辆类型,它也可能有其他值,但是,类型的基数将始终低于 user_id 。就像在现实生活中一样,我希望这个表包含更多的汽车。

Each record in the vehicles table will have a type column set to either 'Car' or 'Plane' in addition to the user_id foreign key. It could also have additional values if more vehicle types are added, however, type will always have a much lower cardinality than user_id. Just as in real life, I expect this table to contain many more Cars.

上有一个复合索引[:user_id,:type] (按此顺序)这些记录由他们的子类查找。

There is a compound index on [:user_id, :type] (in that order) and these records are looked up by their subclasses.

我相信在没有Planes的最坏情况下,将使用索引,因为 user_id 是第一个,第二个部分基本上会被忽略。在这种情况下,单个索引会有一个超小的好处,因为它不会维持复合第二列。

I believe that in the worst case of no Planes, the index will be used since user_id is first and the second part will essentially be ignored. In this case a single index would have a super small benefit in that it's not maintaining the compound second column.

在存在相等分割的情况下会发生什么?

What happens in the case where there's an equal split?


  • 索引是否会将记录减半,从而产生不错的效果?

  • 数据库维护一个复合索引(即 user_id )超过或否定任何节省?

  • Will the index cut the records in half and thus have a decent effect?
  • Will the cost of the database maintaining a compound index over a single one (i.e. just user_id) exceed or negate any savings?

ActiveRecord调用的示例是 Car.where(user_id:10)生成以下SQL:

An example ActiveRecord call would be Car.where(user_id: 10) which generates the following SQL:

SELECT `vehicles`.* FROM `vehicles` WHERE `vehicles`.`type` IN ('Car')
  AND `vehicles`.`user_id` = 10


推荐答案

使用该索引时,性能改进几乎总是超过维护索引(单列或多列)的成本。对于每个 INSERT / DELETE ,它是一个小的增量,如果通过<更改索引字段的值,则加上成本code>更新。 ( UPDATE 的情况很少见。)所以,不要担心维持复合指数的成本。

The cost of maintaining an index (single-column or multi-column) is almost always outweighed by the performance improvement when that index is used. It is a small increment on each INSERT/DELETE, plus a cost if changing the value of an indexed field via UPDATE. (The UPDATE case is rare.) So, don't worry about the cost of "maintaining a compound index".

WHERE `vehicles`.`type` IN ('Car')
  AND `vehicles`.`user_id` = 10

需要 INDEX(user_id,类型)

优化器将


  1. 发现该索引是可能的候选者,

  2. 检查一些统计数据,然后

  3. 使用索引,或者判断基数很差,只需扫描表格。

包括索引;不要担心。

我订购的字段不是(type,user_id)基于你的 IN ,这意味着你有时可能有的多个值

I ordered the fields that way, not (type, user_id) based on your IN, which implies that you might sometimes have multiple values for type.

如果表中的所有行都有 type ='Car',没问题。我所说的一切仍然适用。包含不必要的类型的浪费是微不足道的。

If all rows in the table have type = 'Car', no problem. Everything I have said still applies. The waste of including the unnecessary type is insignificant.

最好拥有所有=列(s )首先在索引中,然后在最多一个其他字段中。 此处进一步讨论

It is better to have all "=" column(s) first in an index, then at most one other field. Further discussion here.

这篇关于具有第二列低基数的复合索引是否会影响性能,应该使用它?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆