连接表的最佳SQL索引 [英] Best SQL indexes for join table

查看:256
本文介绍了连接表的最佳SQL索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑到性能提升,我想知道连接表是否和哪些索引有用(特别是在Rails 3 has_and_belongs_to_many上下文中使用)。

With performance improvements in mind, I was wondering if and which indexes are helpful on a join table (specifically used in a Rails 3 has_and_belongs_to_many context).

我的模型是 Foo Bar 我有一个名为 bars_foos 的连接表。没有主键或时间戳使此表中的旧字段 bar_id:integer foo_id:integer 。我有兴趣知道以下哪些索引是最好的,没有重复:

My models are Foo and Bar and per rails convention, I have a join table called bars_foos. There is no primary key or timestamps making the old fields in this table bar_id:integer and foo_id:integer. I'm interested in knowing which of the following indexes is best and is without duplication:


  1. 复合索引: add_index:bars_foos,[:bar_id,foo_id]


    • 两个索引

    • A。 add_index:bars_foos,:bar_id

    • B。 add_index:bars_foos,:foo_id

  1. A compound index: add_index :bars_foos, [:bar_id, :foo_id]
    • Two indexes
    • A. add_index :bars_foos, :bar_id
    • B. add_index :bars_foos, :foo_id

基本上,我不知道复合索引是否足够,假设开始是有帮助的。我相信复合索引可以用作第一个项目的单个索引,这就是为什么我很漂亮确保使用所有三行肯定会导致不必要的重复。

Basically, I'm not sure if the compound index is enough assuming it is helpful to begin with. I believe that a compound index can be used as a single index for the first item which is why I am pretty sure that using all three lines would certainly result in unnecessary duplication.

最常见的用法将给出一个模型 Foo ,我将使用 foo.bars 的RoR语法要求其相关的,反之亦然, bar.foos 为模型栏的实例

The most common usage will be given an instance of model Foo, I will be asking for its associated bars using the RoR syntax of foo.bars and vice versa with bar.foos for an instance of the model Bar.

这些将生成类型为 SELECT * FROM bars_foos WHERE foo_id =? SELECT * FROM bars_foos WHERE bar_id =?的查询然后使用这些结果ID到 SELECT * FROM bars WHERE ID in(?) SELECT * FROM foos WHERE ID in(?)

These will generate queries of the type SELECT * FROM bars_foos WHERE foo_id = ? and SELECT * FROM bars_foos WHERE bar_id = ? respectively and then using those resultant IDs to SELECT * FROM bars WHERE ID in (?) and SELECT * FROM foos WHERE ID in (?).

如果我不正确,请在评论中更正我,但我不相信在Rails应用程序的上下文中,尝试做一个查询,它指定了两个ID,如 SELECT * FROM bars_foos where bar_id =? AND foo_id =?

Please correct me in the comments if I am incorrect, but I do not believe that, in the context of the Rails application, it is ever going to try to do a query where it specifies both IDs like SELECT * FROM bars_foos where bar_id = ? AND foo_id = ?.

如果有特定数据库优化技术,我很可能会使用PostgreSQL。然而,使用这个代码的其他人可能希望在MySQL或SQLite中使用它,具体取决于它们的Rails配置,所以所有的答案都将被欣赏。

In the event there are database specific optimization techniques, I will most likely be using PostgreSQL. However, others using this code may want to use it in MySQL or SQLite depending on their Rails configuration so all answers are appreciated.

推荐答案

答案



反复的答案往往总是比较常见的情况,这取决于。更具体地说,这取决于你的数据和它将如何使用。

The Answer

The oft repeated answer, which tends to always be the case more often than not is, "it depends." More specifically, it depends on what your data is and how it will be used.

对于我的具体情况(并涵盖所有未来的基础),我的短小tl博士回答是选择#2 ,这是我怀疑的。然而,选择#3可以正常工作,取决于我对数据的使用,创建复合索引的额外时间和空间可以减少将来的查询查询。

The short tl;dr answer for my specific case (and to cover all future bases) is choice #2 which is what I suspected. However, choice #3 would work just fine as, depending on my usage of the data, the extra time and space used creating the compound index could reduce future query lookups.

这样做的原因是数据库尝试聪明,尽可能快地做事情,而不管程序员的输入。添加索引时要考虑的最基本的项目是该对象将被该键查找。如果是,索引可能有助于加快速度。然而,这个索引是否均匀使用都归结于选择性和字段的基数。

The reason for this is that databases try to be smart and try to do things as fast as possible regardless of programmer input. The most basic item to consider when adding an index is will this object be looked up by this key. If yes, an index can potentially help speed that up. However, whether this index is even used all comes down to selectivity and the cardinality of the field.

由于外键通常是另一个AR类的ID,所以基数通常会要高但同样,这取决于您的数据。在我的例子中,如果有许多 Foo ,但很少有 Bar ,我的连接表中的许多条目将有simliar bar_id s。使用 bar_id 具有低基数, bar_id 上的索引可能永远不会被使用,可能会阻碍每次创建新的 bars_foos 条目时,使数据库投入时间和资源*添加到此索引。许多 Bar 和几个 Foo 之间也是一样,而且很少。

Since foreign keys are typically the IDs of another AR class, cardinality usually will be high. But again, this depends on your data. In my example if there are many Foos but few Bars, many of the entries in my join table will have simliar bar_ids. With bar_ids having a low cardinality, an index on bar_id may never be used and may be getting in the way by having the database devote time and resources* to adding to this index every time a new bars_foos entry is created. The same goes with many Bars and few Foos and few of both.

一般的教训是,当考虑表上的索引时,决定这个字段是否都被查找,并且这个字段是否具有很高的基数。也就是说,这个字段有很多不同的值吗?在大多数连接表的情况下,依赖,我们必须更仔细地考虑数据表示的内容和关系本身。在我的情况下,我将拥有 多个 Foo Bar 通过相关的 bar 查找 Foo ,反之亦然。

The general lesson is that when considering an index on a table, decide if the entries will be both looked up by this field and if this field has a high cardinality. That is, does this field have many distinct values? In the case of most join tables "it depends" and we must think more carefully about what the data represents and the relationships themselves. In my case, I will have both many Foos and Bars and will be looking up Foos by their associated bars and vice versa.

我在办公室的另一个好的答案是,你为什么担心你的索引?建立你的应用程序!

Another good answer I got at the office was, "why are you worrying about your indexes? Build your app!"

*在类似的问题

* In a similar question on indexes on STI it was pointed out that the cost of an index is very low so when in doubt, just add it.

这篇关于连接表的最佳SQL索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆