连接表的最佳SQL索引 [英] Best SQL indexes for join table
问题描述
考虑到性能提升,我想知道连接表是否和哪些索引有用(特别是在Rails 3 has_and_belongs_to_many上下文中使用)。
With performance improvements in mind, I was wondering if and which indexes are helpful on a join table (specifically used in a Rails 3 has_and_belongs_to_many context).
我的模型是 Foo
和 Bar
我有一个名为 bars_foos
的连接表。没有主键或时间戳使此表中的旧字段 bar_id:integer
和 foo_id:integer
。我有兴趣知道以下哪些索引是最好的,没有重复:
My models are Foo
and Bar
and per rails convention, I have a join table called bars_foos
. There is no primary key or timestamps making the old fields in this table bar_id:integer
and foo_id:integer
. I'm interested in knowing which of the following indexes is best and is without duplication:
- 复合索引:
add_index:bars_foos,[:bar_id,foo_id]
-
- 两个索引
- A。
add_index:bars_foos,:bar_id
- B。
add_index:bars_foos,:foo_id
- A compound index:
add_index :bars_foos, [:bar_id, :foo_id]
- Two indexes
- A.
add_index :bars_foos, :bar_id
- B.
add_index :bars_foos, :foo_id
基本上,我不知道复合索引是否足够,假设开始是有帮助的。我相信复合索引可以用作第一个项目的单个索引,这就是为什么我很漂亮确保使用所有三行肯定会导致不必要的重复。
Basically, I'm not sure if the compound index is enough assuming it is helpful to begin with. I believe that a compound index can be used as a single index for the first item which is why I am pretty sure that using all three lines would certainly result in unnecessary duplication.
最常见的用法将给出一个模型 Foo
,我将使用 foo.bars
的RoR语法要求其相关的条
,反之亦然, bar.foos
为模型栏的实例
。
The most common usage will be given an instance of model Foo
, I will be asking for its associated bars
using the RoR syntax of foo.bars
and vice versa with bar.foos
for an instance of the model Bar
.
这些将生成类型为 SELECT * FROM bars_foos WHERE foo_id =?
和 SELECT * FROM bars_foos WHERE bar_id =?
的查询然后使用这些结果ID到 SELECT * FROM bars WHERE ID in(?)
和 SELECT * FROM foos WHERE ID in(?)
。
These will generate queries of the type SELECT * FROM bars_foos WHERE foo_id = ?
and SELECT * FROM bars_foos WHERE bar_id = ?
respectively and then using those resultant IDs to SELECT * FROM bars WHERE ID in (?)
and SELECT * FROM foos WHERE ID in (?)
.
如果我不正确,请在评论中更正我,但我不相信在Rails应用程序的上下文中,尝试做一个查询,它指定了两个ID,如 SELECT * FROM bars_foos where bar_id =? AND foo_id =?
。
Please correct me in the comments if I am incorrect, but I do not believe that, in the context of the Rails application, it is ever going to try to do a query where it specifies both IDs like SELECT * FROM bars_foos where bar_id = ? AND foo_id = ?
.
如果有特定数据库优化技术,我很可能会使用PostgreSQL。然而,使用这个代码的其他人可能希望在MySQL或SQLite中使用它,具体取决于它们的Rails配置,所以所有的答案都将被欣赏。
In the event there are database specific optimization techniques, I will most likely be using PostgreSQL. However, others using this code may want to use it in MySQL or SQLite depending on their Rails configuration so all answers are appreciated.
推荐答案
答案
反复的答案往往总是比较常见的情况,这取决于。更具体地说,这取决于你的数据和它将如何使用。
The Answer
The oft repeated answer, which tends to always be the case more often than not is, "it depends." More specifically, it depends on what your data is and how it will be used.
对于我的具体情况(并涵盖所有未来的基础),我的短小tl博士回答是选择#2 ,这是我怀疑的。然而,选择#3可以正常工作,取决于我对数据的使用,创建复合索引的额外时间和空间可以减少将来的查询查询。
The short tl;dr answer for my specific case (and to cover all future bases) is choice #2 which is what I suspected. However, choice #3 would work just fine as, depending on my usage of the data, the extra time and space used creating the compound index could reduce future query lookups.
这样做的原因是数据库尝试聪明,尽可能快地做事情,而不管程序员的输入。添加索引时要考虑的最基本的项目是该对象将被该键查找。如果是,索引可能有助于加快速度。然而,这个索引是否均匀使用都归结于选择性和字段的基数。
The reason for this is that databases try to be smart and try to do things as fast as possible regardless of programmer input. The most basic item to consider when adding an index is will this object be looked up by this key. If yes, an index can potentially help speed that up. However, whether this index is even used all comes down to selectivity and the cardinality of the field.
由于外键通常是另一个AR类的ID,所以基数通常会要高但同样,这取决于您的数据。在我的例子中,如果有许多 Foo
,但很少有 Bar
,我的连接表中的许多条目将有simliar bar_id
s。使用 bar_id
具有低基数, bar_id
上的索引可能永远不会被使用,可能会阻碍每次创建新的 bars_foos
条目时,使数据库投入时间和资源*添加到此索引。许多 Bar
和几个 Foo
之间也是一样,而且很少。
Since foreign keys are typically the IDs of another AR class, cardinality usually will be high. But again, this depends on your data. In my example if there are many Foo
s but few Bar
s, many of the entries in my join table will have simliar bar_id
s. With bar_id
s having a low cardinality, an index on bar_id
may never be used and may be getting in the way by having the database devote time and resources* to adding to this index every time a new bars_foos
entry is created. The same goes with many Bar
s and few Foo
s and few of both.
一般的教训是,当考虑表上的索引时,决定这个字段是否都被查找,并且这个字段是否具有很高的基数。也就是说,这个字段有很多不同的值吗?在大多数连接表的情况下,依赖,我们必须更仔细地考虑数据表示的内容和关系本身。在我的情况下,我将拥有 多个 Foo
和 Bar
通过相关的 bar
查找 Foo
,反之亦然。
The general lesson is that when considering an index on a table, decide if the entries will be both looked up by this field and if this field has a high cardinality. That is, does this field have many distinct values? In the case of most join tables "it depends" and we must think more carefully about what the data represents and the relationships themselves. In my case, I will have both many Foo
s and Bar
s and will be looking up Foo
s by their associated bar
s and vice versa.
我在办公室的另一个好的答案是,你为什么担心你的索引?建立你的应用程序!
Another good answer I got at the office was, "why are you worrying about your indexes? Build your app!"
* In a similar question on indexes on STI it was pointed out that the cost of an index is very low so when in doubt, just add it.
这篇关于连接表的最佳SQL索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!