使用PHP/Laravel从MySQL/MariaDB获取所有POI的方法更快 [英] Which approach is faster for getting all POIs from MySQL/MariaDB with PHP/Laravel

查看:117
本文介绍了使用PHP/Laravel从MySQL/MariaDB获取所有POI的方法更快的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我错了,请纠正我.

Correct me if I'm wrong.

用户已在我的网站上创建了三种获取最近房屋的方法:

There are three approaches to get the nearest homes, users have created in my website:

  1. 要创建一个包含两列(纬度,经度)的表,这两列都是浮动的,并说:

这里是:

$latitude = 50;
$longitude = 60;

SELECT * FROM my_table
    WHERE (latitude  <= $latitude+10  AND latitude  >= $latitude-10)
      AND (longitude <= $longitude+10 AND longitude >= $longitude-10)

例如10表示1公里.

在这种方法中,我们还可以使用harvesine公式.

In this approach we can also use harvesine formula.

  1. 要将这些列(纬度,经度)合并到一个名为point的列(称为POINT类型),然后再次逐行搜索每一行.

  1. To merge those columns(latitude, longitude) to one column named point as POINT type and again search each row one by one.

要将多个点(用户创建的房屋坐标)归类为一个国家(即城市)一部分的类别,如果查询带有$ latitude和$ longitude来查看最近的房屋,我将检查它们存储在哪个类别中,不按顺序搜索所有行,而仅搜索此查询(坐标)所属的部分.

To categorize multiple points(the coordinates of homes users have created) as a category for one section of a country i.e. city and if a query comes with $latitude and $longitude to see the nearest homes, I will check in which category they are stored IN ORDER NOT TO search all rows but search only the section this query(coordinate) belongs to.

由于方法表的每一行,我认为方法1的速度很慢,如果使用harvesine公式,方法1的速度又很慢.

As I guess approach number 1 is slow because of the conditions for each row of table and again slow if I use harvesine formula.

如果我使用ST_Distance,它似乎又很慢,因为它又有很多计算.

If I use ST_Distance it seems again it's slow because again it just has lots of calculations.

但是,如果我使用方法3,似乎检查特定用户的每个部分要比检查所有行更快.我知道如何为每个房屋设置点,但是我不知道如何在另一个表中作为一个部分创建多个房屋位置.

But if I use approach number 3 it seems it is faster to check each section for an specific point user is than check all rows. I know how to set point for each home however I don't know how to create multiple home positions as a section maybe in another table.

在新版本的MySQL和MariaDB中,BTW在InnoDB中受支持.

BTW in new versions of MySQL and MariaDB Spatial Indexes are supported in InnoDB.

我的问题:

  1. 方法1真的很慢,还是其他ST_ *函数与该方法一一检查所有具有提到的公式的行是否相同?哪个更快?

  1. Is approach number 1 really slow or other ST_* functions are the same as this approach to check all rows with those formulas mentioned there one by one? Which one is faster?

方法2是否可以通过简单条件之外的其他方法来使其更快?我的意思是,在使用POINT类型而不是float并使用ST_ *函数而不是自己进行操作时,它会进行任何更改吗?我想知道算法是否不同.

Does approach number 2 do something other than simple conditions to make it faster? I mean does it make any changes when using type of POINT instead of float and using ST_* functions instead of doing it myself? I want to know whether the algorithm is different.

如果方法3在这三种方法中最快,那么如何对点进行分类以免搜索表中的所有行?

If approach number 3 is the fastest in these three approaches, how can I categorize points in order not to search all rows in a table?

如何使用空间索引使其尽可能快?

How can I use Spatial Indexes to make it as fast as possible?

如果存在其他方法并且我没有提及,请您告诉我如何仅通过在PHP/Laravel中在MySQL/MariaDB中进行协调就可以得到最近的房屋?

If any other approaches exist and I didn't mention, could you please tell me how can I get the nearest homes just by having coordinates in MySQL/MariaDB in PHP/Laravel?

谢谢所有

推荐答案

您用于距离的哪个公式无关紧要.重要的是必须读取,处理和排序的行数.在最佳情况下,可以在WHERE子句中为条件使用索引来限制已处理的行数.您可以尝试对位置进行分类-但要想很好地工作,则取决于数据的性质.您还需要找出要使用的类别".更为通用的解决方案是使用 SPATIAL INDEX ST_Within()函数.

Which formula you use for the distance doesn't matter much. What matters much more is the number of rows which you have to read, process and sort. In best case you can use an index for a condition in the WHERE clause to limit the number of processed rows. You can try to categorize your locations - But it depends on the nature of your data, if that is going to work well. You would also need to find out which "category" to use. A more general solution would be to use a SPATIAL INDEX and the ST_Within() function.

现在让我们运行一些测试..

Now let's run some tests..

在我的数据库(MySQL 5.7.18)中,有下表:

In my DB (MySQL 5.7.18) I have the following table:

CREATE TABLE `cities` (
    `cityId` MEDIUMINT(9) UNSIGNED NOT NULL AUTO_INCREMENT,
    `country` CHAR(2) NOT NULL COLLATE 'utf8mb4_unicode_ci',
    `city` VARCHAR(100) NOT NULL COLLATE 'utf8mb4_unicode_ci',
    `accentCity` VARCHAR(100) NOT NULL COLLATE 'utf8mb4_unicode_ci',
    `region` CHAR(2) NULL DEFAULT NULL COLLATE 'utf8mb4_unicode_ci',
    `population` INT(10) UNSIGNED NULL DEFAULT NULL,
    `latitude` DECIMAL(10,7) NOT NULL,
    `longitude` DECIMAL(10,7) NOT NULL,
    `geoPoint` POINT NOT NULL,
    PRIMARY KEY (`cityId`),
    SPATIAL INDEX `geoPoint` (`geoPoint`)
) COLLATE='utf8mb4_unicode_ci' ENGINE=InnoDB

数据来自自由世界城市数据库,其中包含3173958 (310万)行.

The data comes from Free World Cities Database and contains 3173958 (3.1M) rows.

请注意,geoPoint是冗余的,等于POINT(longitude, latitude).

Note that geoPoint is redundant and equal to POINT(longitude, latitude).

考虑到用户位于伦敦的某个地方

Concider the user is located somewhere in London

set @lon = 0.0;
set @lat = 51.5;

,您想从cities表中找到最近的位置.

and you want to find the nearest location from the cities table.

一个琐碎"的查询将是

select c.cityId, c.accentCity, st_distance_sphere(c.geoPoint, point(@lon, @lat)) as dist
from cities c
order by dist
limit 1

结果是

988204 Blackwall 1085.8212159861014

执行时间:〜4.970秒

Execution time: ~ 4.970 sec

如果使用不太复杂的函数ST_Distance(),则在执行时间约为4.580秒的情况下将获得相同的结果-差别不大.

If you use the less complex function ST_Distance(), you get the same result with an execution time of ~ 4.580 sec - which is not so much difference.

请注意,您不需要在表中存储地理位置.您可以很好地使用(point(c.longitude, c.latitude)而不是c.geoPoint.令我惊讶的是,它甚至更快(ST_Distance约3.6秒,ST_Distance_Sphere约4.0秒).如果我根本没有geoPoint列,可能会更快.但这并没有多大关系,因为您不希望用户等待,因此如果可以做得更好,请记录下休止符.

Note that you don't need to store a geo point in the table. You can as good use (point(c.longitude, c.latitude) instead of c.geoPoint. To my surprise it is even faster (~3.6 sec for ST_Distance and ~4.0 sec for ST_Distance_Sphere). It might be even faster if I didn't have a geoPoint column at all. But that still doesn't matter much, since you don't want the user to wait so log for a respose, if you can do better.

现在让我们看看如何将 SPATIAL INDEX ST_Within()一起使用.

Now let's look how we can use the SPATIAL INDEX with ST_Within().

您需要定义一个多边形,其中将包含最近的位置.一种简单的方法是使用 ST_Buffer(),它将生成一个32点的多边形,几乎是一个圆*.

You need to define a polygon which will contain the nearest location. A simple way is to use ST_Buffer() which will generate a polygon with 32 points and is nearly a circle*.

set @point = point(@lon, @lat);
set @radius = 0.1;
set @polygon = ST_Buffer(@point, @radius);

select c.cityId, c.accentCity, st_distance_sphere(c.geoPoint, point(@lon, @lat)) as dist
from cities c
where st_within(c.geoPoint, @polygon)
order by dist
limit 1

结果是相同的.执行时间约为0.000秒(这就是我的客户端( HeidiSQL )所说的).

The result is the same. The execution time is ~ 0.000 sec (that's what my client (HeidiSQL) says).

*请注意,@radius用度表示,因此多边形将更像是椭圆而不是圆形.但是在我的测试中,我总是得到与简单而缓慢的解决方案相同的结果.在将其用于生产代码之前,我会调查更多的极端情况.

* Note that the @radius is notated in degrees and thus the polygon will be more like an ellipse rather than a circle. But in my tests I always got the same result as with the simple and slow solution. I would though investigate more edge cases, before I use it in my production code.

现在,您需要为您的应用程序/数据找到最佳半径.如果太小-您可能没有结果,或者错过了最近的点.如果太大,可能需要处理太多行.

Now you need to find the optimal radius for your application/data. If it's too small - you might get no results, or miss the nearest point. If it's too big - you might need to process too many rows.

给定测试用例的一些数字:

Here some numbers for the given test case:

  • @radius = 0.001:无结果
  • @radius = 0.01:恰好是一个位置(有点幸运)-执行时间约为0.000秒
  • @radius = 0.1:55个位置-执行时间〜0.000秒
  • @radius = 1.0:2183个位置-执行时间〜0.030秒

这篇关于使用PHP/Laravel从MySQL/MariaDB获取所有POI的方法更快的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆