在大表上使用LIKE操作时,MySQL查询变慢 [英] Slow MySQL query when using LIKE operation on large table

查看:241
本文介绍了在大表上使用LIKE操作时,MySQL查询变慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个相当大的表(〜6 GB),并且此查询存在性能问题:

I have a fairly large table (~ 6 GB) and I have performance problems on this query:

          SELECT f.*,
          TIME_FORMAT(f.scheme, '%H:%i') as scheme,
          TIME_FORMAT(f.actual, '%H:%i') as actual,
          DATE_FORMAT(f.flight_date, '%d-%m-%Y') as flight_date_formatted,
          a.iata
          FROM flights_database f
          LEFT JOIN airports a ON f.airport = a.airportNameClean
          WHERE f.flight_date BETWEEN DATE_SUB(CURDATE(), INTERVAL 30 DAY)
          AND DATE_ADD(CURDATE(), INTERVAL 2 DAY)
          AND (f.flight_number LIKE 'New York%' OR f.airport LIKE 'New York%' OR f.airline LIKE 'New York%')
          ORDER by f.flight_date DESC, f.flight_scheme DESC
          LIMIT 50"

我用过EXPLAIN并确定了这些潜在问题

I've used EXPLAIN and identified these underlying problems

  • 使用多个LIKE和OR使其使用一定范围的记录(使用WHERE),并且似乎使其变慢
  • f.flight_scheme DESC,添加此文件时将使用它.删除后,将不使用文件排序.

我在flight_date, flight_number, airport, airline, scheme上有一个索引,并且报告要使用它. 但是此查询仍然可能需要30秒钟左右的时间,这当然是太多了.

I have an index on flight_date, flight_number, airport, airline, scheme and it reports to use it. But this query can still take ~30 seconds which off course is too much.

使用某种子查询替换OR部分可能会有所帮助.但是如何确定运行子查询后实际需要搜索的搜索查询类型(例如,哪一列).

What probably would help is using some kind of subquery to replace the OR part. But how can I determine what type of search query (e.g. which column) I actually need to search on after running the subquery.

对想法和技巧表示赞赏.

Ideas and tips appreciated.

推荐答案

我认为您的当前索引并不是针对查询的最佳选择,主要是因为存在或"表达式.您应该创建3个索引.

I believe your current index isn't optimal for the query mainly because of the 'or' expression. You should create 3 indexes.

(航班号,航班日期,模式)

(flight_number, flight_date, schema)

(机场,航班日期,模式)

(airport, flight_date, schema)

(航空公司,航班日期,模式)

(airline, flight_date, schema)

然后将查询更改为使用三个索引.您也可以使用它,也可以通过添加一个命令(最多限制为50个)来修剪每个子查询.

Then change the query to use the three indexes. You could also play with it a bit and maybe prune each sub query by adding an order by and limit to 50 as well.

select flight.*,
    TIME_FORMAT(flight.scheme, '%H:%i') as scheme,
    TIME_FORMAT(flight.actual, '%H:%i') as actual,
    DATE_FORMAT(flight.flight_date, '%d-%m-%Y') as flight_date_formatted,
    a.iata
from (
    select *
    from (
        select f.Id,
            f.flight_date,
            f.schema
        from flights_database f
        where f.flight_date between DATE_SUB(CURDATE(), INTERVAL 30 DAY)
                and DATE_ADD(CURDATE(), INTERVAL 2 DAY)
            and f.flight_number like 'New York%'
        order by f.flight_date desc,
            f.schema desc limit 50

        union

        select f.Id,
            f.flight_date,
            f.schema
        from flights_database f
        where f.flight_date between DATE_SUB(CURDATE(), INTERVAL 30 DAY)
                and DATE_ADD(CURDATE(), INTERVAL 2 DAY)
            and f.airline like 'New York%'
        order by f.flight_date desc,
            f.schema desc limit 50

        union

        select f.Id,
            f.flight_date,
            f.schema
        from flights_database f
        where f.flight_date between DATE_SUB(CURDATE(), INTERVAL 30 DAY)
                and DATE_ADD(CURDATE(), INTERVAL 2 DAY)
            and f.airport like 'New York%'
        order by f.flight_date desc,
            f.schema desc limit 50
        ) f1
    order by f1.flight_date desc,
        f.schema desc limit 50
    ) f2
inner join flights_database flight on f2.Id = flight.Id
left join airports a on flight.airport = a.airportNameClean;

当前,您的or语句将扩展为: [航班日期,航班号],[航班日期,航空公司],[航班日期,机场]

Currently your or statement will expand to: [flight_date, flight_number], [flight_date, airline], [flight_date, airport]

因此,当优化程序查看您的索引时,它将匹配 [flight_date,flight_number]到当前索引[flight_date,flight_number,机场,航空公司,方案](请注意它们是如何开始的),但是当遇到[flight_date,Airlines]时,没有索引可以匹配此表达式.因此,优化器将确定需要进行索引扫描或表扫描.然后它将再次遇到[flight_date,airport],它将确定这是需要索引扫描还是表格扫描.

So when the optimizer looks at your index it will match [flight_date, flight_number] to your current index [flight_date, flight_number, airport, airline, scheme] (notice how they start off the same), but when it encounters [flight_date, airline] there isn't an index to match this expression. So the optimizer would determine it would need to do a index scan or a table scan. Then it would encounter [flight_date, airport] again it will determine this to require a index scan or a table scan.

使用三个新索引和新查询,它将三个索引与三个条件匹配,并确定每个索引都需要索引查找(希望如此).然后,我们添加方案"以按ID保存所有符合条件的行的行查找.

With the three new indexes and the new query it would match the three indexes to the three criteria and determine each would require an index seek (hopefully). Then we include 'scheme' to save the row lookup by id for all the rows matching the criteria.

这篇关于在大表上使用LIKE操作时,MySQL查询变慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆