优化mysql查询以使用空间索引选择多边形中的所有点 [英] Optimizing mysql query to select all points with in polygon using spatial indexes

查看:302
本文介绍了优化mysql查询以使用空间索引选择多边形中的所有点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

首先,我承认我在空间功能方面的经验非常少.我在MySQL中有一张表格,里面有20个字段和23549187个包含地理数据的记录.字段之一是点",它是点数据类型,上面有空间索引.我有一个查询,该查询会选择多边形内所有看起来像这样的点,

Firstly, I admit that my experience with spatial functions is very minimal. I have a table in MySQL with 20 fields and 23549187 records that contain geographical data. One of the fields is 'point' which is of point data type and has spatial index on it. I have a query that selects all points within a polygon which looks like this,

select * from `table_name` where ST_CONTAINS(ST_GEOMFROMTEXT('POLYGON((151.186 -23.497,151.207 -23.505,151.178 -23.496,151.174 -23.49800000000001,151.176 -23.496,151.179 -23.49500000000002,151.186 -23.497))'), `point`)

由于多边形较小,因此效果很好.但是,如果多边形变大,执行时间就会变得很慢,并且最慢的查询直到现在运行了15分钟.添加索引确实有助于将索引降低到15分钟,否则将需要将近一个小时的时间.我有什么可以做的进一步改进. 该查询将由作为守护程序运行的PHP脚本运行,我担心这种缓慢的查询是否会使MySQL服务器宕机.

This works well as the polygon is small. However, if the polygon gets massive, the execution times gets really slow and the slowest query until now ran for 15 mins. Adding the index had really helped to bring it down to 15 mins which otherwise would have taken close to an hour. Is there anything I can do here for further improvement. This query will be run by a PHP script that runs as a daemon and I am worried if this slow queries will bring the MySQL server down.

欢迎提出所有改善建议.谢谢.

All suggestions to make it better are welcome. Thanks.

show create table;

CREATE TABLE `table_name` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `lat` float(12,6) DEFAULT NULL,
  `long` float(12,6) DEFAULT NULL,
  `point` point NOT NULL,
  PRIMARY KEY (`id`),
  KEY `lat` (`lat`,`long`),
  SPATIAL KEY `sp_index` (`point`)
) ENGINE=MyISAM AUTO_INCREMENT=47222773 DEFAULT CHARSET=utf8mb4

这里我不应该透露更多字段,但是过滤器胜出了

There are few more fields that I am not supposed to disclose it here however the filter won

解释慢速查询的sql输出:

Explain sql output for the slow query:

+----+-------------+------------+------+---------------+------+---------+------+----------+-------------+
| id | select_type | table      | type | possible_keys | key  | key_len | ref  | rows     | Extra       |
+----+-------------+------------+------+---------------+------+---------+------+----------+-------------+
|  1 | SIMPLE      | table_name | ALL  | NULL          | NULL | NULL    | NULL | 23549187 | Using where |
+----+-------------+------------+------+---------------+------+---------+------+----------+-------------+

解释用于较小多边形查询的sql输出,

Explain sql output for query with smaller polygons,

+----+-------------+------------+-------+---------------+----------+---------+------+------+-------------+
| id | select_type | table      | type  | possible_keys | key      | key_len | ref  | rows | Extra       |
+----+-------------+------------+-------+---------------+----------+---------+------+------+-------------+
|  1 | SIMPLE      | table_name | range | sp_index      | sp_index | 34      | NULL |    1 | Using where |
+----+-------------+------------+-------+---------------+----------+---------+------+------+-------------+

看起来最大的多边形不使用索引.

Looks like the biggest polygon does not use the index.

推荐答案

MySQL使用 R树用于索引空间数据.像 B树索引,这对于以总数的一小部分为目标的查询是最佳的.随着边界多边形变大,可能匹配的数目也增加,并且在某些时候,优化器认为切换到全表扫描更为有效.这似乎是这里的场景,我看到三个选项:

MySQL uses R-Trees for indexing spatial data. Like B-Tree indexes, these are optimal for queries targeting a small subset of the total number. As your bounding polygon gets larger the number of possible matches increases and, at some point, the optimizer decides it is more efficient to switch to a full table scan. That appears to be the scenario here, and I see three options:

首先,尝试将LIMIT添加到查询中.通常,如果优化器认为在全表扫描中发生较少的I/O寻道,则MySQL会忽略该索引.但是,至少使用B树索引,MySQL会短路该逻辑,并在存在LIMIT时始终执行B树潜水.我假设R-Tree发生了类似的短路.

First, try adding a LIMIT to your query. Normally, MySQL ignores the index if the optimizer concludes fewer I/O seeks would occur in a full table scan. But, with B-Tree indexes at least, MySQL will short-circuit that logic and always perform the B-Tree dive when LIMIT is present. I hypothesize R-Tree have a similar short-circuiting.

第二个,和第一个类似,请尝试强制MySQL使用索引.这指示MySQL表扫描比优化程序决定的昂贵.请理解,优化器仅具有启发式功能,并不真正知道昂贵"的东西超出了其内部统计数据的结论.我们人类有直觉,有时-有时-了解得更多.

Second, and similar in spirit to the first, try forcing MySQL to use the index. This instructs MySQL that the table scan is more expensive than the optimizer decides. Understand that the optimizer only has heuristics and doesn't really know how "expensive" things are beyond what its internal statistics conclude. We humans have intuition, which sometimes - sometimes - knows better.

select * force index (`sp_index`) from `table_name` where ST_CONTAINS(ST_GEOMFROMTEXT('POLYGON((151.186 -23.497,151.207 -23.505,151.178 -23.496,151.174 -23.49800000000001,151.176 -23.496,151.179 -23.49500000000002,151.186 -23.497))'), `point`)

最后,如果这些方法不起作用,那么您需要做的是将边界多边形分解为较小的多边形.例如,如果您的边界多边形是每边500 km的正方形,则将其分成每边250 km的4个正方形,或每边125 km的16个正方形,依此类推.然后将所有这些一起UNION.该索引将用于每个索引,并且累积结果可能会更快. (请注意,将它们一起UNION很重要:MySQL不能在空间查询中应用多个范围扫描.)

Finally, if those don't work, then what you need to do is break up your bounding polygon into smaller polygons. For example, if your bounding polygon is a square 500km per side, break it up into 4 squares 250km on each side, or 16 squares 125km per side, etc. Then UNION all of these together. The index will be used on each one, and the cumulative result may be faster. (Note it's important to UNION them together: MySQL cannot apply multiple range scans on a spatial query.)

这篇关于优化mysql查询以使用空间索引选择多边形中的所有点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆