全文索引和复合索引以及它们如何影响查询 [英] Fulltext and composite indexes and how they affect the query

查看:299
本文介绍了全文索引和复合索引以及它们如何影响查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

只要说我有一个查询如下.

Just say I had a query as below..

SELECT 
    name,category,address,city,state
FROM
    table
WHERE 
    MATCH(name,subcategory,category,tag1) AGAINST('education')
AND 
    city='Oakland'
AND
    state='CA' 
LIMIT
    0, 10;

..我有一个fulltext索引为name,subcategory,category,tag1和一个composite索引为city,state;这个查询够用吗?只是想知道在将全文索引与MATCH/AGAINST结合使用时,是否在混合其他AND时是否需要其他东西.

..and I had a fulltext index as name,subcategory,category,tag1 and a composite index as city,state; is this good enough for this query? Just wondering if something extra is needed when mixing additional AND's when making use of the fulltext index with the MATCH/AGAINST.

我想理解的是,查询中但未在所选索引(全文索引)中建立索引的其他列会发生什么,上面的示例是citystate. MySQL现在无法找到两个匹配的行,因为它不能使用两个索引(或者可以吗?)-因此,基本上,我试图了解MySQL如何优化查找数据对于不在选定的全文索引中的列,以及是否可以做或应该做任何事情来优化查询.

What I am trying to understand is, what happens with the additional columns that are within the query but are not indexed in the chosen index (the fulltext index), the above example being city and state. How does MySQL now find the matching rows for these since it can't use two indexes (or can it?) - so, basically, I'm trying to understand how MySQL goes about finding the data optimally for the columns NOT in the chosen fulltext index and if there is anything I can or should do to optimize the query.

推荐答案

如果我理解您的问题,您就会知道MATCH AGAINST使用您的FULLTEXT索引,并且您想知道MySQL如何继续应用WHERE子句的其余部分(即.它执行表扫描还是索引查找.

If I understand your question, you know that the MATCH AGAINST uses your FULLTEXT index and your wondering how MySQL goes about applying the rest of the WHERE clause (ie. does it do a tablescan or an indexed lookup).

这就是我对您的表的假设:它在某些id列和FULLTEXT索引上具有一个PRIMARY KEY.

Here's what I'm assuming about your table: it has a PRIMARY KEY on some id column and the FULLTEXT index.

因此,首先,MySQL将从不将FULLTEXT索引用于城市/州的WHERE子句.为什么?因为FULLTEXT索引仅适用于MATCH AGAINST.在第一组项目符号后的段落中,参见此处 (而不是目录项目符号).

So first off, MySQL will never use the FULLTEXT index for the city/state WHERE clause. Why? Because FULLTEXT indexes only apply with MATCH AGAINST. See here in the paragraph after the first set of bullets (not the Table of Contents bullets).

在您的情况下,假设您的表不仅只有10行,MySQL将为您的MATCH AGAINST应用FULLTEXT索引,然后对这些结果进行表扫描以应用城市/状态在哪里.

In your case, assuming your table doesn't only have like 10 rows, MySQL will apply the FULLTEXT index for your MATCH AGAINST, then do a tablescan on those results to apply the city/state WHERE.

那如果将BTREE索引添加到城市和州怎么办?

So what if you add a BTREE index onto city and state?

CREATE INDEX city__state ON table (city(10),state(2)) USING BTREE;

好的MySQL对此查询只能使用一个索引,因为它是一个简单的选择.它会使用或使用BTREE.请注意,当我说一个索引时,我指的是一个索引定义,而不是多部分索引中的一列. Anwway,这便引出一个问题,它使用哪个 ?

Well MySQL can only use one index for this query since it's a simple select. It will either use the FULLTEXT or the BTREE. Note that when I say one index, I mean one index definition, not one column in a multi-part index. Anwway, this then begs the question which one does it use?

这取决于表分析. MySQL将尝试估计(基于最后一个OPTIMIZE TABLE中的表统计信息),该索引将修剪最多的记录.如果城市/州WHERE使您减少到10条记录,而MATCH AGAINST仅使您减少到100条记录,则MySQL将对城市/州WHERE使用city__state索引 first ,然后对它进行表扫描再次匹配.

That depends on the table analysis. MySQL will attempt to estimate (based on table stats from the last OPTIMIZE TABLE) which index will prune the most records. If the city/state WHERE gets you down to 10 records while the MATCH AGAINST only gets you down to 100, then MySQL will use the city__state index first for the city/state WHERE and then do a tablescan for the MATCH AGAINST.

另一方面,如果MATCH_AGAINST使您减少到10条记录,而城市/州的WHERE仅使您减少到1000条记录,则MySQL将首先应用FULLTEXT索引并针对城市和州进行tablescan.

On the other hand, if the MATCH_AGAINST gets you down to 10 records while the city/state WHERE gets you down to only a 1000, then MySQL will apply the FULLTEXT index first and tablescan for city and state.

最下一行是索引的基数.本质上,将进入您的索引的值有多独特?如果表中的每个记录都将城市设置为Oakland,那么它不是一个非常独特的键,因此拥有 city ='Oakland'并不会真正减少您的记录数量.在这种情况下,我们说您的city__state索引具有低基数.

The bottom line is the cardinality of your index. Essentially, how unique are the values that will go into your index? If every record in your table has city set to Oakland, then it's not a very unique key and so having city = 'Oakland' doesn't really reduce the number of records all that much for you. In that case, we say your city__state index has a low cardinality.

因此,如果FULLTEXT索引中90%的单词是"John",那么由于完全相同的原因,这对您没有多大帮助.

Consequently if 90% of the words in your FULLTEXT index are "John", then that doesn't really help you much either for the exact same reasons.

如果您有足够的空间和UPDATE/DELETE/INSERT开销,建议您添加BTREE索引,让MySQL决定他要使用哪个索引.根据我的经验,他通常会很好地选择合适的人.

If you can afford the space and the UPDATE/DELETE/INSERT overhead, I would recommend adding the BTREE index and letting MySQL decide which index he wants to use. In my experience, he usually does a very good job of picking the right one.

我希望能回答您的问题.

I hope that answers your question.

在旁注中,请确保您为BTREE索引选择了正确的大小(在我的示例中,我选择了城市中的前10个字符).这显然对基数产生了巨大影响.如果您选择了city(1),那么显然您会得到一个较低的基数,而如果您选择了city(10).

On a side note, making sure you pick the right size for your BTREE index (in my example I picked the first 10 char in city). This obviously makes a huge impact to cardinality. If you picked city(1), then obviously you'll get a lower cardinality then if you did city(10).

EDIT2 :您在EXPLAIN中看到的索引修剪次数最多的MySQL查询计划(估计).

MySQL's query plan (estimation) for which index prunes the most records is what you see in EXPLAIN.

这篇关于全文索引和复合索引以及它们如何影响查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆