SQL查询搜索满足Column1< = X< = Column2的行非常慢 [英] A SQL query searching for rows that satisfy Column1 <= X <= Column2 is very slow

查看:111
本文介绍了SQL查询搜索满足Column1< = X< = Column2的行非常慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用MySQL数据库,并具有下表:

I am using a MySQL DB, and have the following table:

CREATE TABLE SomeTable (
  PrimaryKeyCol BIGINT(20) NOT NULL,
  A BIGINT(20) NOT NULL,
  FirstX INT(11) NOT NULL,
  LastX INT(11) NOT NULL,
  P INT(11) NOT NULL,
  Y INT(11) NOT NULL,
  Z INT(11) NOT NULL,
  B BIGINT(20) DEFAULT NULL,
  PRIMARY KEY (PrimaryKeyCol),
  UNIQUE KEY FirstLastXPriority_Index (FirstX,LastX,P)
) ENGINE=InnoDB;

该表包含430万行,并且一旦初始化就不会更改.

The table contains 4.3 million rows, and never changes once initialized.

此表的重要列是FirstXLastXYZP.

The important columns of this table are FirstX, LastX, Y, Z and P.

如您所见,我在FirstXLastXP行上有一个唯一索引.

As you can see, I have a unique index on the rows FirstX, LastX and P.

FirstXLastX定义了一个整数范围.

The columns FirstX and LastX define a range of integers.

我需要在此表上运行的查询针对给定的X获取所有具有FirstX< = X< = LastX的行(即,其范围包含输入数字X的所有行).

The query I need to run on this table fetches for a given X all the rows having FirstX <= X <= LastX (i.e. all the rows whose range contains the input number X).

例如,如果表包含行(我仅包括相关列):

For example, if the table contains the rows (I'm including only the relevant columns):

FirstX     LastX      P        Y         Z
------     ------     -       ---       ---
100000     500000     1       111       222 
150000     220000     2       333       444
180000     190000     3       555       666
550000     660000     4       777       888   
700000     900000     5       999       111 
750000     850000     6       222       333 

例如,我需要包含值185000的行,应该返回前3行.

and I need, for example, the rows that contain the value 185000, the first 3 rows should be returned.

我尝试过的查询应该使用索引:

The query I tried, which should be using the index, is:

SELECT P, Y, Z FROM SomeTable WHERE FirstX <= ? AND LastX >= ? LIMIT 10;

即使没有LIMIT,对于任何给定的X,此查询也应返回少量记录(少于50).

Even without the LIMIT, this query should return a small number of records (less than 50) for any given X.

此查询由Java应用程序针对X的120000个值执行.令我惊讶的是,此查询花费了 10个小时(!),每个查询的平均时间为 0.3秒.

This query was executed by a Java application for 120000 values of X. To my surprise, it took over 10 hours (!) and the average time per query was 0.3 seconds.

这是不可接受的,甚至不能接受.它应该快得多.

This is not acceptable, not even near acceptable. It should be much faster.

我检查了一个查询,该查询花费了 0.563秒,以确保正在使用索引.我尝试的查询(与上面的查询相同,但使用特定的整数值而不是?)返回了 2行.

I examined a single query that took 0.563 seconds to make sure the index was being used. The query I tried (the same as the query above with a specific integer value instead of ?) returned 2 rows.

我用EXPLAIN找出正在发生的事情:

I used EXPLAIN to find out what was happening:

id               1
select_type      SIMPLE
table            SomeTable 
type             range
possible_keys    FirstLastXPriority_Index
key              FirstLastXPriority_Index 
key_len          4
ref              NULL
rows             2104820
Extra            Using index condition

正如您所看到的,即使只有2行满足条件,执行也涉及到2104820行(几乎是表行的50%),因此将检查索引的一半以仅返回2行

As you can see, the execution involved 2104820 rows (nearly 50% of the rows of the table), even though only 2 rows satisfy the conditions, so half of the index is examined in order to return just 2 rows.

查询或索引是否有问题?您可以建议对查询或索引进行改进吗?

Is there something wrong with the query or the index? Can you suggest an improvement to the query or the index?

一些答案​​建议我为X的多个值批量运行查询.由于输入到我的应用程序时我实时运行此查询,所以我不能这样做.每次输入X到达时,我都必须对X执行查询,并对查询的输出执行一些处理.

Some answers suggested that I run the query in batches for multiple values of X. I can't do that, since I run this query in real time, as inputs arrive to my application. Each time an input X arrives, I must execute the query for X and perform some processing on the output of the query.

推荐答案

我找到了一个依赖表中数据属性的解决方案.我宁愿有一个不依赖于当前数据的更通用的解决方案,但是暂时这是我所拥有的最好的解决方案.

I found a solution that relies on properties of the data in the table. I would rather have a more general solution that doesn't depend on the current data, but for the time being that's the best I have.

原始查询的问题:

SELECT P, Y, Z FROM SomeTable WHERE FirstX <= ? AND LastX >= ? LIMIT 10;

表示当第一个条件FirstX <= ?满足大部分行时,执行可能需要扫描FirstXLastXP索引中的大部分条目.

is that the execution may require scanning a large percentage of the entries in the FirstX,LastX,P index when the first condition FirstX <= ? is satisfied by a large percentage of the rows.

我为减少执行时间所做的工作是观察到LastX-FirstX相对较小.

What I did to reduce the execution time is observe that LastX-FirstX is relatively small.

我运行了查询:

SELECT MAX(LastX-FirstX) FROM SomeTable;

得到了4200000.

这表示表中所有行的FirstX >= LastX – 4200000.

所以要满足LastX >= ?,我们还必须满足FirstX >= ? – 4200000.

So in order to satisfy LastX >= ?, we must also satisfy FirstX >= ? – 4200000.

因此我们可以向查询中添加条件,如下所示:

So we can add a condition to the query as follows:

SELECT P, Y, Z FROM SomeTable WHERE FirstX <= ? AND FirstX >= ? - 4200000 AND LastX >= ? LIMIT 10;

在我测试的示例中,处理的索引条目的数量从2104820减少到18,运行时间从 0.563秒减少到 0.0003秒.

In the example I tested in the question, the number of index entries processed was reduced from 2104820 to 18 and the running time was reduced from 0.563 seconds to 0.0003 seconds.

我用相同的120000X测试了新查询.输出与旧查询相同.时间从 10小时减少到 5.5分钟,速度快了 100倍.

I tested the new query with the same 120000 values of X. The output was identical to the old query. The time went down from over 10 hours to 5.5 minutes, which is over 100 times faster.

这篇关于SQL查询搜索满足Column1&lt; = X&lt; = Column2的行非常慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆