无法优化MySQL查询 [英] Trouble optimizing MySQL query

查看:145
本文介绍了无法优化MySQL查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理以下查询,但不确定如何继续进一步优化:

I'm working on the following query but not sure how to proceed with further optimizations:

SELECT u.id AS userId, firstName, profilePhotoId, preferredActivityId, preferredSubActivityId, availabilityType,
       3959 * ACOS(COS(radians(requestingUserLat)) * COS(radians(u.latitude)) * COS(radians(u.longitude) - radians(requestingUserLon)) + SIN(radians(requestingUserLat)) * SIN(radians(u.latitude))) AS distanceInMiles
  FROM users u
 WHERE u.id IN (
        SELECT uu.id
          FROM users uu
         WHERE uu.latitude      between lat1    and lat2 -- MySQL 5.7 supports Point data type, but it is not indexed in innoDB. We store latitude and longitude as DOUBLE for now
           AND uu.longitude     between lon1    and lon2
           AND uu.dateOfBirth   between maxAge  and minAge -- dates are in millis, therefore maxAge will have a smaller value than minAge and so it needs to go first
     )
   AND IF(gender       is null, TRUE, u.gender = gender)
   AND IF(activityType is null, TRUE, u.preferredActivityType = activityType)
   AND u.accountState = 'A'
   AND u.id != userId
HAVING distanceInMiles < searchRadius ORDER BY distanceInMiles LIMIT pagingStart, pagingLength;


CREATE INDEX `findMatches` ON `users` (`latitude` ASC, `longitude` ASC, `dateOfBirth` ASC) USING BTREE;


这里的想法是使用上面指定的覆盖索引,使用内部查询根据用户位置和年龄标识符合条件的行。在具有几百万行的表中,将它们缩小到几千,而不需要全表扫描。然后针对更细粒度的条件(例如性别,可用性等)测试结果行。 - 这次对缩减结果集的完全扫描是不可避免的。

The idea here is to have an inner query that identifies qualifying rows based on user location and age, using the covering index specified above. In a table with a few million rows, that narrows them down to a few thousand without requiring a full table scan. Then the resulting rows are tested against more fine-grain conditions, such as gender, availability, etc. - this time a full scan of the reduced resultset is inevitable.

这按预期差不多运行, EXPLAIN 显示内部查询确实利用覆盖索引的完整密钥长度(3列),然后外部查询看起来通过PK返回的行。

This runs "almost" as expected, EXPLAIN reveals that the inner query does indeed make use of the full key length (3 columns) of the covering index, and the outer query then looks up the rows returned, by PK.

< br>
问题:

当搜索范围在几百英里之内时性能令人满意,但当我到达千里之后,它开始降级因为指定边界内的用户数量增加。如果搜索范围保持不变,但用户数量增加了几个订单,问题也会变得明显。以下是我到目前为止所发现的问题:


The problem:
The performance is satisfactory while the search range is within a few hundred miles, but when I get to a thousand miles, it starts degrading because the number of users within the specified boundary increases. The problem would also become apparent if the search range remained the same, but the number of users increased by a few orders. Here are the problems that I have identified so far:


  1. MySQL目前不支持 LIMIT 在内部查询中,因此内部查询将返回所有符合条件的 userIDs (即数千),即使外部查询将它们限制为只有十几个左右。

  2. 启用 optimizer_trace 并查看幕后工作表明只有列纬度用作范围。我不确定为什么会这样,特别是因为 EXPLAIN 表明使用了完整的索引键长度。

  1. MySQL does not currently support LIMIT in inner queries, therefore the inner query will return ALL qualifying userIDs (i.e., thousands), even if the outer query will then limit them to just a dozen or so.
  2. Enabling optimizer_trace and looking at the work behind the scenes suggests that only column latitude of my covering index is used as a range. I'm not sure why that is, especially since EXPLAIN suggests that the full index key length is used.



问题:

我如何解决上述(1)和(2)?在有人建议使用lat和long的空间数据类型之前,请注意最新的InnoDB引擎(MySQL v5.7)不支持空间索引,只支持空间数据类型

推荐答案

我认为其他答案已经涵盖了这一点。在索引中使用查询中的数据与使用索引中的数据来寻找正确的解决方案之间存在差异。后者是指数的最有效使用。前者很有帮助,但效率只是不读数据页。

I think this has already been covered in other answers. There is a difference between using the data in the index for a query, and using the data in the index to seek to the right solution. The latter is the most efficient use of an index. The former is helpful, but the efficiency is simply in not reading in the data pages.

我认为你可以通过使用 exists <来改善你的查询/ code>而不是中的。这应该允许在外层进行过滤以提高查询的性能:

I think you can improve your query by using exists instead of in. This should allow filtering at the outer level to improve performance of the query:

SELECT u.id AS userId, firstName, profilePhotoId, preferredActivityId, preferredSubActivityId, availabilityType,
       3959 * ACOS(COS(radians(requestingUserLat)) * COS(radians(u.latitude)) * COS(radians(u.longitude) - radians(requestingUserLon)) + SIN(radians(requestingUserLat)) * SIN(radians(u.latitude))) AS distanceInMiles
FROM users u
WHERE EXISTS (SELECT 1
              FROM users uu
              WHERE uu.latitude      between lat1    and lat2  AND
                    uu.longitude     between lon1    and lon2 AND
                    uu.dateOfBirth   between maxAge  and minAge  AND
                    uu.id = u.id
             ) AND
     IF(gender       is null, TRUE, u.gender = gender) AND
     IF(activityType is null, TRUE, u.preferredActivityType = activityType) AND
     u.accountState = 'A' AND
     u.id <> userId
HAVING distanceInMiles < searchRadius
ORDER BY distanceInMiles LIMIT pagingStart, pagingLength;

注意,表达式 IF(性别为空,TRUE,u .gender =性别)是非常不敏感的,因为它总是评估为真。如果您有一个名为 gender 的变量,则不会在此表达式中使用它。 gender 将根据MySQL范围规则进行解释,并且是表中的一列。您应该始终使用类似 var _ p _ 之类的前缀来区分参数与表中的列。

As a note, the expression IF(gender is null, TRUE, u.gender = gender) is rather non-sensical because it always evaluates to true. If you have a variable called gender, it will not be used in this expression. The gender will be interpreted according to MySQL scoping rules and be a column in a table. You should always use a prefix like var_ or p_ or something to distinguish parameters from columns in the table.

编辑:

我应该提到索引需要包含 id 作为使用的第一列

I should have mentioned that the index needs to include id as the first column to use with exists.

这篇关于无法优化MySQL查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆