要创建哪些索引来加速我的重型过滤器和分组查询? [英] What indexes to create to speed up my heavy filter and grouping query?

查看:166
本文介绍了要创建哪些索引来加速我的重型过滤器和分组查询?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这样的表,它有数百万记录:

I have a table like this, it has millions of records:

CREATE TABLE `myTable` (
`DateTime` DATETIME NOT NULL,
`Col1` MEDIUMINT UNSIGNED NOT NULL,
`Col2` MEDIUMINT UNSIGNED NOT NULL,
`Col3` MEDIUMINT UNSIGNED NOT NULL,
`Col4` MEDIUMINT UNSIGNED NOT NULL,
`Event` MEDIUMINT UNSIGNED NOT NULL,
`State` MEDIUMINT UNSIGNED NOT NULL,
PRIMARY KEY (`DateTime`,`Col4`,`Event`,`State`)
);

我运行一个查询来计算一个时间范围内按时间段分组的记录数,如果它们基于Col1 / Col2 / Col3 / Col4值匹配'过滤器'。例如,3分钟的时间段:

I run a query to count the number of records, grouped by a time period, for a time range, if they match 'filters' based on the Col1/Col2/Col3/Col4 values. For example, a 3 minute period:

select
    FROM_UNIXTIME(UNIX_TIMESTAMP(MIN(`DateTime`))-(UNIX_TIMESTAMP(MIN(`DateTime`)) % (3*60))) as 'Period',
    count(*) as 'NumberOfRecords'
from
    `myTable`
where
    `DateTime` > '2016-09-01' and `DateTime` < '2016-09-09'
    AND `Col1` IN (3, 6, 11, 14, etc... )
    AND `Col2` IN (5 ,25 , 325 , 293, 294, etc.... )
    AND `Col3` IN (3 , 9 , 95 , 395 , 435, etc...)
    AND `Col4` IN (124, 125, 135, 325, etc...)
group by
    UNIX_TIMESTAMP(`DateTime`) DIV (3*60);

我应该加快这个查询的索引?

What index should I have to speed up this query? I don't care how slow insertion gets, I want to have the query run extremely fast.

一般来说,每个col1,col2,col3有大约1,000个唯一值。

In general there are around 1,000 unique values for each col1,col2,col3,col4, but there are millions of records that would fit in the date range.

我在想像:

CREATE INDEX `myIndex` ON `myTable` ( `DateTime`, `Col`,`Col2`,`Col3`,`Col4 )

但我不确定我有排序权吗?或者最好是制作4个索引,每个索引( DateTime ColX )?

But I am not sure I have the ordering right? Or is it better to make 4 indexes, one for each (DateTime,ColX)?

推荐答案

这个查询很难优化给定五列表结构,因为你最多运行六个不同的范围谓词

This query is very hard to optimize given your five-column table structure, because you're running up to six different range predicates.

范围谓词包括操作> < <> BETWEEN LIKE IN()。基本上,除 = 之外的任何类型的搜索。

A range predicate includes operations >, <, <>, BETWEEN, LIKE, or IN(). Basically, any type of search other than =.

范围谓词可能匹配列中的许多值。

A range predicate potentially matches many values in the column.

一个等式谓词只匹配列中的一个值(可能有许多行具有该值,但它是一个值)。

An equality predicate matches exactly one value in the column (there may be many rows with that value, but it's one value).

定义索引时,您在索引中放置的列应该是等号比较中的列引用,然后是范围谓词中引用的一个列。

When defining indexes, the columns you put in the index should be columns references in equality comparisons first, then just one column referenced in a range predicate. Any additional columns in the index beyond the first column referenced in a range predicate will not count for doing the lookup.

例如,如果您在<$ c上有一个索引$ c>(col1,col2,col3),条件如下:

For example, if you have an index on (col1, col2, col3), the following conditions:

WHERE col1=123 AND col2 IN (4, 5, 6) AND col3=789

此查询可以使用前两列的索引。 col3 将不使用索引。该查询将检查由前两个项匹配的所有行,并对所有这些行一个接一个地计算第三个项。

This query can make use of the first two columns of the index. The col3 will not use the index. The query will examine all rows matched by the first two terms, and evaluate the third term against all those rows one by one.

同一个索引将使用所有三列以便通过以下条件查找:

Whereas the same index will use all three columns for lookup by the following conditions:

WHERE col1=123 AND col2=789 AND col3 IN (4, 5, 6)

也就是说,前两列的相等谓词和索引中最后一列的范围谓词。

That is, equality predicates for the first two columns, and a range predicate for the last column in the index.

当您使用 EXPLAIN ,其中一列报告索引条目的字节数。在上面的例子中,假设所有三列都是32位整数列。 EXPLAIN对于第一个查询将报告它使用8个字节(两个整数值),而EXPLAIN用于第二个查询将报告它使用12个字节(三个整数值)。

When you use EXPLAIN, one of the columns reports the number of bytes of an index entry. In the above example, suppose all three columns are 32-bit integer columns. EXPLAIN for the first query will report that it uses 8 bytes (two integer's worth), and EXPLAIN for second query will report it uses 12 bytes (three integer's worth).

在您的情况下,您有范围谓词 all 条件中的条款。这不能用B树索引来优化。它可以使用任何一列的索引。因此,您可以创建五个单独的索引,每个在一个列上,并希望优化器选择一个最有效地缩小搜索。或者,您可以使用索引提示选择最佳索引

In your case, you have range predicates for all the terms in your conditions. This is not optimizable with a B-Tree index. It could use an index for any one of the columns. So you might create five individual indexes, each on one of the columns, and hope that the optimizer picks the one that narrows down the search most effectively. Or you can use index hints to pick the best index yourself.

我写了一个您可能感兴趣的演示文稿,名为如何设计索引,真的
。以下是我的简报录音: https://www.youtube.com/watch?v= ELR7-RdU9XU

I wrote a presentation you may find interesting called How to Design Indexes, Really . Here's a recording of my presentation: https://www.youtube.com/watch?v=ELR7-RdU9XU

除了B树索引之外,还有其他类型的索引。对多个列的范围谓词搜索可能需要R树索引。因此,您可能会发现要真正优化此查询,您需要将数据的副本加载到 Apache Solr Crate 或类似的其他搜索引擎。

There are other types of indexes besides B-tree indexes. A range predicate search over multiple columns may require an R-tree index. So you may find that to really optimize this query, you need to load a copy of the data into Apache Solr or Crate or some other search engine like that.

这篇关于要创建哪些索引来加速我的重型过滤器和分组查询?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆