MySQL:复合索引全文+ btree? [英] MySQL: composite index fulltext+btree?

查看:152
本文介绍了MySQL:复合索引全文+ btree?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想要一个查询在一个字段上进行全文搜索,然后在另一个字段上进行排序(想象一下按发布日期搜索一些文本文档和顺序)。这张桌子有大约17M行,它们或多或少均匀地分布在日期中。这是用于Web应用程序的请求/响应周期,所以查询必须在200ms内完成。



示意图:

  SELECT * FROM表WHERE MATCH(text)AGAINST('query')ORDER BY date = my_date DESC LIMIT 10; 

一种可能性是在发布日期的文本字段和btree上具有全文索引: p>

  ALTER TABLE表ADD FULLTEXT index_name(text); 
CREATE INDEX index_name ON表(日期);

在我的情况下,这不起作用。会发生什么是MySQL评估两个执行路径。一种是使用全文索引来查找相关行,一旦选择它们,使用FILESORT对这些行进行排序。第二种方法是使用BTREE索引对整个表进行排序,然后使用FULL TABLE SCAN查找匹配项。他们都不好。在我的情况下,MySQL选择前者。问题是,第一步可以选择一些30k的结果,然后它必须排序,这意味着整个查询可能需要10秒的订单。



所以我是思考:FULLTEXT + BTREE的复合索引是否存在?如果您知道FULLTEXT索引的工作方式,它首先标记您正在编制索引的列,然后为这些标记构建一个索引。对我来说,想象一个合成索引是合理的,这样第二个索引就是每个令牌的BTREE日期。这是否存在于MySQL中,如果是的话,是什么语法?



奖励问题:如果它不存在于MySQL中,PostgreSQL在这种情况下表现更好吗?

解决方案

使用 IN BOOLEAN MODE



日期索引无用。没有办法将这两个索引合并在一起。



请注意,如果用户搜索以30K行显示的内容,查询速度会很慢。有没有简单的消息。



我怀疑表中有一个 TEXT 列吗?如果是这样,就有希望。不要盲目地做 SELECT * ,我们首先找到id并获取应用的 LIMIT ,然后 em>执行 *

  SELECT a。* 
FROM tbl AS a
JOIN(SELECT date,id
FROM tbl
WHERE MATCH(...)AGAINST(...)
ORDER BY date DESC
LIMIT 10)AS x
USING(date,id)
ORDER BY date DESC;

加上

  PRIMARY KEY(date,id),
INDEX(id),
FULLTEXT(...)

这个公式和索引应该是这样的:


  1. 使用 FULLTEXT 找到30K行,交付PK。

  2. 使用PK,按照 date 排序30K行。

  3. 选择最后一个10,交付 date,id

  4. 返回表10次使用PK。

  5. 再次排序。 (是的,这是必要的。)

更多(回应过多的评论) p>

我的重新配置背后的目标是避免抓取 30K 行的所有列。取而代之的是,它仅提取 PRIMARY KEY ,然后将其分解为10,然后仅提取 * 10行。



关于InnoDB表中的 COUNT


  • INDEX(col)使得索引扫描适用于 SELECT COUNT(*) SELECT COUNT(col)不含 WHERE

  • code> INDEX(col), SELECT COUNT(*)将使用最小索引;但是 SELECT COUNT(col)`将需要一个扫描。

  • 表扫描通常是慢于索引扫描。

  • 小心计时 - 索引和/或表是否已被缓存在RAM中会受到很大影响。



关于 FULLTEXT 的另一件事是在前面的 + 单词 - 要说每个单词都必须存在,否则就没有匹配。这可能会减少30K。


$ b

FULLTEXT 指数将提供日期, id 是随机顺序,而不是PK顺序。无论如何,假设任何顺序都是错误的,因此,添加 ORDER BY 是'正确的',然后让Optimizer抛出它,如果它知道这是多余的。有时候,优化器可以利用 ORDER BY (并非如此)。

删除<在许多情况下,code> ORDER BY 使查询运行得更快。这是因为它避免了提取30K行并对它们进行排序。相反,它只是提供any10行。



(我还没有使用Postgres的经验,所以我无法解决这个问题。)


I want a query that does a fulltext search on one field and then a sort on a different field (imagine searching some text document and order by publication date). The table has about 17M rows and they are more or less uniformly distributed in dates. This is to be used in a webapp request/response cycle, so the query has to finish in at most 200ms.

Schematically:

SELECT * FROM table WHERE MATCH(text) AGAINST('query') ORDER BY date=my_date DESC LIMIT 10;

One possibility is having a fulltext index on the text field and a btree on the publication date:

ALTER TABLE table ADD FULLTEXT index_name(text);
CREATE INDEX index_name ON table (date);

This doesn't work very well in my case. What happens is that MySQL evaluates two execution paths. One is using the fulltext index to find the relevant rows, and once they are selected use a FILESORT to sort those rows. The second is using the BTREE index to sort the entire table and then look for matches using a FULL TABLE SCAN. They're both bad. In my case MySQL chooses the former. The problem is that the first step can select some 30k results which it then has to sort, which means the entire query might take of the order 10 seconds.

So I was thinking: do composite indexes of FULLTEXT+BTREE exist? If you know how a FULLTEXT index works, it first tokenizes the column you're indexing and then builds an index for the tokens. It seems reasonable to me to imagine a composite index such that the second index is a BTREE in dates for each token. Does this exist in MySQL and if so what's the syntax?

BONUS QUESTION: If it doesn't exist in MySQL, would PostgreSQL perform better in this situation?

解决方案

Use IN BOOLEAN MODE.

The date index is not useful. There is no way to combine the two indexes.

Beware, if a user searches for something that shows up in 30K rows, the query will be slow. There is no straightforward away around it.

I suspect you have a TEXT column in the table? If so, there is hope. Instead of blindly doing SELECT *, let's first find the ids and get the LIMIT applied, then do the *.

SELECT a.* 
    FROM tbl AS a
    JOIN ( SELECT date, id
             FROM tbl
             WHERE MATCH(...) AGAINST (...)
             ORDER BY date DESC
             LIMIT 10 ) AS x
        USING(date, id)
    ORDER BY date DESC;

Together with

PRIMARY KEY(date, id),
INDEX(id),
FULLTEXT(...)

This formulation and indexing should work like this:

  1. Use FULLTEXT to find 30K rows, deliver the PK.
  2. With the PK, sort 30K rows by date.
  3. Pick the last 10, delivering date, id
  4. Reach back into the table 10 times using the PK.
  5. Sort again. (Yeah, this is necessary.)

More (Responding to a plethora of Comments):

The goal behind my reformulation is to avoid fetching all columns of 30K rows. Instead, it fetches only the PRIMARY KEY, then whittles that down to 10, then fetches * only 10 rows. Much less stuff shoveled around.

Concerning COUNT on an InnoDB table:

  • INDEX(col) makes it so that an index scan works for SELECT COUNT(*) or SELECT COUNT(col) without a WHERE.
  • Without INDEX(col),SELECT COUNT(*)will use the "smallest" index; butSELECT COUNT(col)` will need a table scan.
  • A table scan is usually slower than an index scan.
  • Be careful of timing -- It is significantly affected by whether the index and/or table is already cached in RAM.

Another thing about FULLTEXT is the + in front of words -- to say that each word must exist, else there is no match. This may cut down on the 30K.

The FULLTEXT index will deliver the date, id is random order, not PK order. Anyway, it is 'wrong' to assume any ordering, hence it is 'right' to add ORDER BY, then let the Optimizer toss it if it knows that it is redundant. And sometimes the Optimizer can take advantage of the ORDER BY (not in your case).

Removing just the ORDER BY, in many cases, makes a query run much faster. This is because it avoids fetching, say, 30K rows and sorting them. Instead it simply delivers "any" 10 rows.

(I have not experience with Postgres, so I cannot address that question.)

这篇关于MySQL:复合索引全文+ btree?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆