PHP/MySQL 中的地理搜索(距离)(性能) [英] Geo-Search (Distance) in PHP/MySQL (Performance)

查看:17
本文介绍了PHP/MySQL 中的地理搜索(距离)(性能)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 MySQL 表 (MyISAM),其中包含我选择的大约 200k 个纬度/经度对条目,这些条目基于与另一个纬度/经度对的对距离(大圆公式).(例如,在 50.281852、2.504883 周围 10 公里半径内的所有条目)

I have a MySQL-table (MyISAM) containing about 200k entries of lat/long pairs that I select from, based on the pairs distance (great circle formula) from another lat/long pair. (e.g. all entries that are within a 10km radius around 50.281852, 2.504883)

我的问题是这个查询需要大约 0.28 秒.只为那 20 万个条目运行(每天都在增加).虽然 0,28 秒.正常情况下没问题,此查询运行非常频繁,因为它支持我的网络应用程序的主要功能,而且通常它是更大查询的一部分.

My problem is that this query takes about 0,28 sec. to run just for those 200k entries (which continue to get more every day). While 0,28 sec. would be fine normally, this query runs very often as it powers the main feature of my web-app, and often times it's part of a larger query.

有什么办法可以加快速度吗?显然 MySQL 每次都必须遍历所有 200k 个条目,并对每个条目执行大圆公式.我在 Stack Overflow 上阅读了有关 geohashing、R-Trees 等的一些内容,但我认为这不是我想要的方式.部分是因为我从来不是数学的忠实粉丝,但主要是因为我认为这个问题已经在图书馆/扩展/等中被比我更聪明的人解决了.已经过广泛测试并且正在定期更新.

Is there any way to speed this up? Obviously MySQL has to run through all 200k entries every time and perform the great circle formula for every entry. I read something about geohashing, R-Trees and the like here on Stack Overflow but I don't think that's the way I want to go. Partly because I've never been a big fan of maths, but mostly because I think that this problem has already been solved by someone smarter than me in a library/extension/etc. that has been tested extensively and is being updated regularly.

MySQL 似乎有一个空间扩展,但那个不提供距离函数.我应该查看另一个数据库来放入这个坐标对吗?PostgreSQL 似乎有一个相当成熟的 Spatial 扩展.你知道吗?或者 PostgreSQL 会不会过于简单地使用大圆公式来获取某个区域内的所有条目?

MySQL seems to have a spatial extension but that one doesn't provide a distance function. Should I be looking at another database to put this coordinate-pairs in? PostgreSQL seems to have a fairly mature Spatial extension. Do you know anything about it? Or would PostgreSQL too simply just use the great circle formula to get all entries within a certain region?

是否有专门的独立产品或 mysql 扩展已经可以满足我的需求?

Is there maybe a specialized stand-alone product or mysql-extension that already does what I'm looking for?

或者是否有我可以用来进行计算的 PHP 库?使用 APC,我可以轻松地将经纬度对放入内存(这 20 万个条目大约需要 5MB),然后在 PHP 中运行查询.然而,这种方法的问题是,我会有一个像 SELECT .. FROM .. WHERE id in (id1, id2, ..) 这样的 MySQL 查询,所有结果可能高达几千.MySQL 处理此类查询的能力如何?然后(因为这是一项处理数字的任务)在 PHP 中执行此操作是否足够快?

Or is there maybe A PHP library I could use to do the calculations? Using APC I could easily fit the lat-long pairs into memory (those 200k entries take about 5MB) and then run the query inside of PHP. The problem with this approach however is that then I'd have a MySQL query like SELECT .. FROM .. WHERE id in (id1, id2, ..) for all the results which can be up to a few thousand. How well does MySQL handle Queries like these? And then (since this is a number-crunching task) would doing this in PHP be fast enough?

还有什么我应该/不应该做的想法吗?

Any other Ideas what I should/shouldn't do?

为了完整起见,这里是示例查询,去掉了任何不相关的部分(正如我所说,通常这是我加入多个表的更大查询的一部分):

For completeness, here is the sample query, stripped of any irrelevant parts (as I said, usually this is part of a bigger query where I join multiple tables):

SELECT id,
       6371 * acos( sin( radians( 52.4042924 ) ) * sin( radians( lat ) ) + cos( radians( 50.281852 ) ) * cos( radians( lat ) ) * cos( radians( 2.504883 ) - radians( lon ) ) ) AS dst
FROM geoloc
HAVING dst <10
ORDER BY dst ASC

推荐答案

计算一个边界框以在 SQL 查询的 WHERE 子句中选择行的子集,这样您就可以只执行昂贵的距离计算行的子集,而不是针对表中的整个 200k 记录.该方法在此关于可移动类型的文章(使用 PHP代码示例).然后,您可以在针对该子集的查询中包含 Haversine 计算以计算实际距离,并在该点考虑 HAVING 子句.

Calculate a bounding box to select a subset of the rows in the WHERE clause of your SQL query, so that you're only executing the expensive distance calculation on that subset of rows rather than against the entire 200k records in your table. The method is described in this article on Movable Type (with PHP code examples). Then you can include the Haversine calculation in your query against that subset to calculate the actual distances, and factor in the HAVING clause at that point.

边界框有助于提高性能,因为这意味着您只需对一小部分数据进行昂贵的距离计算.这实际上与 Patrick 建议的方法相同,但 Movable Type 链接对该方法进行了大量解释,以及可用于构建边界框和 SQL 查询的 PHP 代码.

It's the bounding box that helps your performance, because it means you're only doing the expensive distance calculation on a small subset of your data. This is effectively the same method that Patrick has suggested, but the Movable Type link has extensive explanations of the method, as well as PHP code that you can use to build the bounding box and your SQL query.

编辑

如果您认为半正弦不够准确,那么还有 Vincenty 公式.

If you don't think haversine is accurate enough, then there's also the Vincenty formula.

//  Vincenty formula to calculate great circle distance between 2 locations expressed as Lat/Long in KM

function VincentyDistance($lat1,$lat2,$lon1,$lon2){
    $a = 6378137 - 21 * sin($lat1);
    $b = 6356752.3142;
    $f = 1/298.257223563;

    $p1_lat = $lat1/57.29577951;
    $p2_lat = $lat2/57.29577951;
    $p1_lon = $lon1/57.29577951;
    $p2_lon = $lon2/57.29577951;

    $L = $p2_lon - $p1_lon;

    $U1 = atan((1-$f) * tan($p1_lat));
    $U2 = atan((1-$f) * tan($p2_lat));

    $sinU1 = sin($U1);
    $cosU1 = cos($U1);
    $sinU2 = sin($U2);
    $cosU2 = cos($U2);

    $lambda = $L;
    $lambdaP = 2*M_PI;
    $iterLimit = 20;

    while(abs($lambda-$lambdaP) > 1e-12 && $iterLimit>0) {
        $sinLambda = sin($lambda);
        $cosLambda = cos($lambda);
        $sinSigma = sqrt(($cosU2*$sinLambda) * ($cosU2*$sinLambda) + ($cosU1*$sinU2-$sinU1*$cosU2*$cosLambda) * ($cosU1*$sinU2-$sinU1*$cosU2*$cosLambda));

        //if ($sinSigma==0){return 0;}  // co-incident points
        $cosSigma = $sinU1*$sinU2 + $cosU1*$cosU2*$cosLambda;
        $sigma = atan2($sinSigma, $cosSigma);
        $alpha = asin($cosU1 * $cosU2 * $sinLambda / $sinSigma);
        $cosSqAlpha = cos($alpha) * cos($alpha);
        $cos2SigmaM = $cosSigma - 2*$sinU1*$sinU2/$cosSqAlpha;
        $C = $f/16*$cosSqAlpha*(4+$f*(4-3*$cosSqAlpha));
        $lambdaP = $lambda;
        $lambda = $L + (1-$C) * $f * sin($alpha) * ($sigma + $C*$sinSigma*($cos2SigmaM+$C*$cosSigma*(-1+2*$cos2SigmaM*$cos2SigmaM)));
    }

    $uSq = $cosSqAlpha*($a*$a-$b*$b)/($b*$b);
    $A = 1 + $uSq/16384*(4096+$uSq*(-768+$uSq*(320-175*$uSq)));
    $B = $uSq/1024 * (256+$uSq*(-128+$uSq*(74-47*$uSq)));

    $deltaSigma = $B*$sinSigma*($cos2SigmaM+$B/4*($cosSigma*(-1+2*$cos2SigmaM*$cos2SigmaM)- $B/6*$cos2SigmaM*(-3+4*$sinSigma*$sinSigma)*(-3+4*$cos2SigmaM*$cos2SigmaM)));

    $s = $b*$A*($sigma-$deltaSigma);
    return $s/1000;
}


echo VincentyDistance($lat1,$lat2,$lon1,$lon2);

这篇关于PHP/MySQL 中的地理搜索(距离)(性能)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆