多列索引性能 [英] Multi-column index performance
问题描述
我有以下查询:
SELECT *
from stop_times
WHERE (departure_time BETWEEN '02:41' AND '05:41'
OR departure_time BETWEEN '26:41' AND '29:41')
AND stop_times.stop_id IN(51511,51509,51508,51510,6,53851,51522,51533)
在~800ms内返回134行。如果我拆分它:
that returns 134 rows in ~800ms. If I split it:
SELECT *
from stop_times
WHERE (departure_time BETWEEN '02:41' AND '05:41'
OR departure_time BETWEEN '26:41' AND '29:41')
在〜10ms内返回~110k行和
returns ~110k rows in ~10ms and
SELECT *
from stop_times
WHERE stop_times.stop_id IN(51511,51509,51508,51510,6,53851,51522,51533)
返回在~100ms内~5k行。
returns ~5k rows in ~100ms.
我尝试使用多列索引(departure_time和stop_id)以及2个单独的索引,但在任何一种情况下都是第一个查询似乎不能少于~800ms。我的stop_times表有大约3.5M行。有什么我可以丢失的,这会大大加快第一次查询吗?
I tried using both a multi-column index (departure_time and stop_id) as well as 2 separate indexes, but in either case the first query can't seem to take less than ~800ms. My stop_times table has about 3.5M rows. Is there anything I could be missing and that would significantly speed up that first query?
更新1:显示表创建:
CREATE TABLE `stop_times` (
`trip_id` varchar(20) DEFAULT NULL,
`departure_time` time DEFAULT NULL,
`stop_id` varchar(20) DEFAULT NULL,
KEY `index_stop_times_on_trip_id` (`trip_id`),
KEY `index_stop_times_on_departure_time_and_stop_id` (`departure_time`,`stop_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
stop_id和trip_id是varchars而不是整数是我无法控制的......
stop_id and trip_id being varchars instead of integers is beyond my control unfortunately...
更新2: EXPLAIN
departure_time,stop_id
多列索引:
select_type: SIMPLE
type: range
rows: 239084
EXPLAIN
for stop_id,departure_time
多列索引:
EXPLAIN
for stop_id, departure_time
multi-column index:
select_type: SIMPLE
type: range
rows: 141
更新3: EXPLAIN
for IN(51511,51509,51508,51510,6,53851,51522,51533)
select_type: SIMPLE
type: ALL
rows: 3556973 (lol)
EXPLAIN
for IN(51511,51509,51508,51510, 6,53851,51522,51533)
select_type: SIMPLE
type: range
rows: 141
推荐答案
您是否创建了一个索引 stop_id,departure_time
?因为 departure_time,stop_id
绝对不会做任何事情。
Did you create an index stop_id, departure_time
? Because departure_time, stop_id
will do absolutely nothing.
这是真正的硬件 - 它有处理索引的所有可能的坏事:(
This is a really hard one - it has every possible bad thing for dealing with indexes :(
你有一个范围,一个OR和一个非连续的IN - 它不会比这更糟糕。
You have a range, an OR and a non contiguous IN - it doesn't get worse than that.
尝试 stop_id,departure_time
如果它没有帮助那么你就没有什么可以做不到切换到PostgreSQL。
Try stop_id, departure_time
and if it doesn't help then there is nothing much you can do short of switching to PostgreSQL.
你也可以尝试重写查询:
You can also try rewriting the query as:
SELECT *
from stop_times
WHERE ( stop_times.stop_id IN(51511,51509,51508,51510,6,53851,51522,51533)
AND departure_time BETWEEN '02:41' AND '05:41'
)
OR ( stop_times.stop_id IN(51511,51509,51508,51510,6,53851,51522,51533)
AND departure_time BETWEEN '26:41' AND '29:41'
)
或:
SELECT *
from stop_times
WHERE ( stop_times.stop_id IN(51511,51509,51508,51510,6,53851,51522,51533)
AND departure_time BETWEEN '02:41' AND '05:41'
)
UNION ALL
SELECT *
from stop_times
WHERE ( stop_times.stop_id IN(51511,51509,51508,51510,6,53851,51522,51533)
AND departure_time BETWEEN '26:41' AND '29:41'
)
这篇关于多列索引性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!