搜索5个最接近邮政编码的位置-我应该怎么走? [英] Searching the 5 closest locations to a zip code - what way should I go?

查看:82
本文介绍了搜索5个最接近邮政编码的位置-我应该怎么走?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想要的东西:

  1. 用户输入邮政编码或城市名
  2. 我在数据库中搜索最近的5个位置
  3. 显示该位置附近距离用户最近的5个位置

我到目前为止所拥有的:

让我们说一个具有以下内容的地方表:

(约 16000 行)

CREATE TABLE `locations` (
 `locationID` int(11) NOT NULL AUTO_INCREMENT,
 `name` varchar(150) NOT NULL,
 `firstname` varchar(100) DEFAULT NULL,
 `lastname` varchar(100) DEFAULT NULL,
 `street` varchar(100) NOT NULL,
 `city` varchar(100) NOT NULL,
 `state` varchar(100) NOT NULL,
 `zipcode` varchar(10) NOT NULL,
 `phone` varchar(20) NOT NULL,
 `web` varchar(255) DEFAULT NULL,
 `machine` enum('Unbekannt','Foo','Bar') DEFAULT 'Unbekannt',
 `surface` enum('Unbekannt','Foo','Bar','') DEFAULT 'Unbekannt',
 PRIMARY KEY (`locationID`)
) ENGINE=InnoDB AUTO_INCREMENT=25 DEFAULT CHARSET=utf8

  1. ID
  2. 名称
  3. 邮政编码
  4. 城市

现在,我在世界上所有城镇中都拥有第二张桌子:

(约 340万行)

CREATE TABLE `geoData` (
 `geoID` int(11) NOT NULL AUTO_INCREMENT,
 `countryCode` char(2) NOT NULL,
 `zipCode` varchar(20) NOT NULL,
 `name` varchar(180) NOT NULL,
 `state` varchar(100) NOT NULL,
 `stateCode` varchar(20) NOT NULL,
 `county` varchar(100) NOT NULL,
 `countyCode` varchar(20) NOT NULL,
 `community` varchar(100) NOT NULL,
 `communityCode` varchar(20) NOT NULL,
 `lat` mediumint(6) NOT NULL,
 `lon` mediumint(6) NOT NULL,
 PRIMARY KEY (`lon`,`lat`,`geoID`) USING BTREE,
 KEY `geoID` (`geoID`)
) ENGINE=InnoDB AUTO_INCREMENT=16482 DEFAULT CHARSET=utf8
/*!50100 PARTITION BY RANGE (lat)
(PARTITION p0 VALUES LESS THAN (-880000) ENGINE = InnoDB,
PARTITION p1 VALUES LESS THAN (-860000) ENGINE = InnoDB,
PARTITION p2 VALUES LESS THAN (-840000) ENGINE = InnoDB,
PARTITION p3 VALUES LESS THAN (-820000) ENGINE = InnoDB,
PARTITION p4 VALUES LESS THAN (-800000) ENGINE = InnoDB,
PARTITION p5 VALUES LESS THAN (-780000) ENGINE = InnoDB,
PARTITION p6 VALUES LESS THAN (-760000) ENGINE = InnoDB,
PARTITION p7 VALUES LESS THAN (-740000) ENGINE = InnoDB,
PARTITION p8 VALUES LESS THAN (-720000) ENGINE = InnoDB,
PARTITION p9 VALUES LESS THAN (-700000) ENGINE = InnoDB,
PARTITION p10 VALUES LESS THAN (-680000) ENGINE = InnoDB,
PARTITION p11 VALUES LESS THAN (-660000) ENGINE = InnoDB,
PARTITION p12 VALUES LESS THAN (-640000) ENGINE = InnoDB,
PARTITION p13 VALUES LESS THAN (-620000) ENGINE = InnoDB,
PARTITION p14 VALUES LESS THAN (-600000) ENGINE = InnoDB,
PARTITION p15 VALUES LESS THAN (-580000) ENGINE = InnoDB,
PARTITION p16 VALUES LESS THAN (-560000) ENGINE = InnoDB,
PARTITION p17 VALUES LESS THAN (-540000) ENGINE = InnoDB,
PARTITION p18 VALUES LESS THAN (-520000) ENGINE = InnoDB,
PARTITION p19 VALUES LESS THAN (-500000) ENGINE = InnoDB,
PARTITION p20 VALUES LESS THAN (-480000) ENGINE = InnoDB,
PARTITION p21 VALUES LESS THAN (-460000) ENGINE = InnoDB,
PARTITION p22 VALUES LESS THAN (-440000) ENGINE = InnoDB,
PARTITION p23 VALUES LESS THAN (-420000) ENGINE = InnoDB,
PARTITION p24 VALUES LESS THAN (-400000) ENGINE = InnoDB,
PARTITION p25 VALUES LESS THAN (-380000) ENGINE = InnoDB,
PARTITION p26 VALUES LESS THAN (-360000) ENGINE = InnoDB,
PARTITION p27 VALUES LESS THAN (-340000) ENGINE = InnoDB,
PARTITION p28 VALUES LESS THAN (-320000) ENGINE = InnoDB,
PARTITION p29 VALUES LESS THAN (-300000) ENGINE = InnoDB,
PARTITION p30 VALUES LESS THAN (-280000) ENGINE = InnoDB,
PARTITION p31 VALUES LESS THAN (-260000) ENGINE = InnoDB,
PARTITION p32 VALUES LESS THAN (-240000) ENGINE = InnoDB,
PARTITION p33 VALUES LESS THAN (-220000) ENGINE = InnoDB,
PARTITION p34 VALUES LESS THAN (-200000) ENGINE = InnoDB,
PARTITION p35 VALUES LESS THAN (-180000) ENGINE = InnoDB,
PARTITION p36 VALUES LESS THAN (-160000) ENGINE = InnoDB,
PARTITION p37 VALUES LESS THAN (-140000) ENGINE = InnoDB,
PARTITION p38 VALUES LESS THAN (-120000) ENGINE = InnoDB,
PARTITION p39 VALUES LESS THAN (-100000) ENGINE = InnoDB,
PARTITION p40 VALUES LESS THAN (-80000) ENGINE = InnoDB,
PARTITION p41 VALUES LESS THAN (-60000) ENGINE = InnoDB,
PARTITION p42 VALUES LESS THAN (-40000) ENGINE = InnoDB,
PARTITION p43 VALUES LESS THAN (-20000) ENGINE = InnoDB,
PARTITION p44 VALUES LESS THAN (0) ENGINE = InnoDB,
PARTITION p45 VALUES LESS THAN (20000) ENGINE = InnoDB,
PARTITION p46 VALUES LESS THAN (40000) ENGINE = InnoDB,
PARTITION p47 VALUES LESS THAN (60000) ENGINE = InnoDB,
PARTITION p48 VALUES LESS THAN (80000) ENGINE = InnoDB,
PARTITION p49 VALUES LESS THAN (100000) ENGINE = InnoDB,
PARTITION p50 VALUES LESS THAN (120000) ENGINE = InnoDB,
PARTITION p51 VALUES LESS THAN (140000) ENGINE = InnoDB,
PARTITION p52 VALUES LESS THAN (160000) ENGINE = InnoDB,
PARTITION p53 VALUES LESS THAN (180000) ENGINE = InnoDB,
PARTITION p54 VALUES LESS THAN (200000) ENGINE = InnoDB,
PARTITION p55 VALUES LESS THAN (220000) ENGINE = InnoDB,
PARTITION p56 VALUES LESS THAN (240000) ENGINE = InnoDB,
PARTITION p57 VALUES LESS THAN (260000) ENGINE = InnoDB,
PARTITION p58 VALUES LESS THAN (280000) ENGINE = InnoDB,
PARTITION p59 VALUES LESS THAN (300000) ENGINE = InnoDB,
PARTITION p60 VALUES LESS THAN (320000) ENGINE = InnoDB,
PARTITION p61 VALUES LESS THAN (340000) ENGINE = InnoDB,
PARTITION p62 VALUES LESS THAN (360000) ENGINE = InnoDB,
PARTITION p63 VALUES LESS THAN (380000) ENGINE = InnoDB,
PARTITION p64 VALUES LESS THAN (400000) ENGINE = InnoDB,
PARTITION p65 VALUES LESS THAN (420000) ENGINE = InnoDB,
PARTITION p66 VALUES LESS THAN (440000) ENGINE = InnoDB,
PARTITION p67 VALUES LESS THAN (460000) ENGINE = InnoDB,
PARTITION p68 VALUES LESS THAN (480000) ENGINE = InnoDB,
PARTITION p69 VALUES LESS THAN (500000) ENGINE = InnoDB,
PARTITION p70 VALUES LESS THAN (520000) ENGINE = InnoDB,
PARTITION p71 VALUES LESS THAN (540000) ENGINE = InnoDB,
PARTITION p72 VALUES LESS THAN (560000) ENGINE = InnoDB,
PARTITION p73 VALUES LESS THAN (580000) ENGINE = InnoDB,
PARTITION p74 VALUES LESS THAN (600000) ENGINE = InnoDB,
PARTITION p75 VALUES LESS THAN (620000) ENGINE = InnoDB,
PARTITION p76 VALUES LESS THAN (640000) ENGINE = InnoDB,
PARTITION p77 VALUES LESS THAN (660000) ENGINE = InnoDB,
PARTITION p78 VALUES LESS THAN (680000) ENGINE = InnoDB,
PARTITION p79 VALUES LESS THAN (700000) ENGINE = InnoDB,
PARTITION p80 VALUES LESS THAN (720000) ENGINE = InnoDB,
PARTITION p81 VALUES LESS THAN (740000) ENGINE = InnoDB,
PARTITION p82 VALUES LESS THAN (760000) ENGINE = InnoDB,
PARTITION p83 VALUES LESS THAN (780000) ENGINE = InnoDB,
PARTITION p84 VALUES LESS THAN (800000) ENGINE = InnoDB,
PARTITION p85 VALUES LESS THAN (820000) ENGINE = InnoDB,
PARTITION p86 VALUES LESS THAN (840000) ENGINE = InnoDB,
PARTITION p87 VALUES LESS THAN (860000) ENGINE = InnoDB,
PARTITION p88 VALUES LESS THAN (880000) ENGINE = InnoDB,
PARTITION p89 VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */

  1. ID
  2. 城市
  3. 邮政编码
  4. 纬度
  5. 经度

基于此文章以及关于此事的其他一些读物有一个存储过程,可为我提供 n 个点(纬度/经度)附近最近的城镇的位置/邮政编码.

我的存储过程:

    BEGIN
    DECLARE _deg2rad DOUBLE DEFAULT PI()/1800000;

    SET @my_lat := _my_lat,
        @my_lon := _my_lon,
        @deg2dist := 0.0111325,  
        @start_deg := _start_dist / @deg2dist,  
        @max_deg := _max_dist / @deg2dist,
        @cutoff := @max_deg / SQRT(2),  
        @dlat := @start_deg,  
        @lon2lat := COS(_deg2rad * @my_lat),
        @iterations := 0;        

    SET @sql = CONCAT(
        "SELECT COUNT(*) INTO @near_ct
            FROM geoData
            WHERE lat    BETWEEN @my_lat - @dlat
                             AND @my_lat + @dlat   
              AND lon    BETWEEN @my_lon - @dlon
                             AND @my_lon + @dlon");
    PREPARE _sql FROM @sql;
    MainLoop: LOOP
        SET @iterations := @iterations + 1;
        SET @dlon := ABS(@dlat / @lon2lat);  
        SET @dlon := IF(ABS(@my_lat) + @dlat >= 900000, 3600001, @dlon);  
        EXECUTE _sql;
        IF ( @near_ct >= _limit OR         
             @dlat >= @cutoff ) THEN       
            LEAVE MainLoop;
        END IF;
        SET @dlat := LEAST(2 * @dlat, @cutoff);   
    END LOOP MainLoop;
    DEALLOCATE PREPARE _sql;

    SET @dlat := IF( @dlat >= @max_deg OR @dlon >= 1800000,
                @max_deg,
                GCDist(ABS(@my_lat), @my_lon,
                       ABS(@my_lat) - @dlat, @my_lon - @dlon) );
    SET @dlon := IFNULL(ASIN(SIN(_deg2rad * @dlat) /
                             COS(_deg2rad * @my_lat))
                            / _deg2rad 
                        , 3600001);    


    IF (ABS(@my_lon) + @dlon < 1800000 OR    
        ABS(@my_lat) + @dlat <  900000) THEN 
        SET @sql = CONCAT(
            "SELECT *,
                    @deg2dist * GCDist(@my_lat, @my_lon, lat, lon) AS dist
                FROM geoData
                WHERE lat BETWEEN @my_lat - @dlat
                              AND @my_lat + @dlat   
                  AND lon BETWEEN @my_lon - @dlon
                              AND @my_lon + @dlon   
                HAVING dist <= ", _max_dist, "
                ORDER BY dist
                LIMIT ", _limit
                        );
    ELSE
        SET @west_lon := IF(@my_lon < 0, @my_lon, @my_lon - 3600000);
        SET @east_lon := @west_lon + 3600000;
        SET @sql = CONCAT(
            "( SELECT *,
                    @deg2dist * GCDist(@my_lat, @west_lon, lat, lon) AS dist
                FROM geoData
                WHERE lat BETWEEN @my_lat - @dlat
                              AND @my_lat + @dlat 
                  AND lon BETWEEN @west_lon - @dlon
                              AND @west_lon + @dlon   
                HAVING dist <= ", _max_dist, " )
            UNION ALL
            ( SELECT *,
                    @deg2dist * GCDist(@my_lat, @east_lon, lat, lon) AS dist
                FROM geoData
                WHERE lat BETWEEN @my_lat - @dlat
                              AND @my_lat + @dlat   
                  AND lon BETWEEN @east_lon - @dlon
                              AND @east_lon + @dlon   
                HAVING dist <= ", _max_dist, " )
            ORDER BY dist
            LIMIT ", _limit
                        );
    END IF;

    PREPARE _sql FROM @sql;
    EXECUTE _sql;
    DEALLOCATE PREPARE _sql;
END

我的问题:

我想输入邮政编码或城镇名称,然后从那里开始搜索.所以我的想法是我要求提供这些信息,并从世界上的所有城镇/邮政编码中查找我的表格.之后,如果仅找到一个结果,或者如果有多个结果,我会要求用户选择正确的选择,那么我就会得到经/纬度信息.

在那之后,我开始搜索靠近我当前位置的最近城镇.假设我要列出50个城镇的清单.然后,我将向上查找并查看包含位置的表是否与其中的5个结果相匹配.

转念一想,这听起来像是个坏主意...

方法1:

我阅读了存储过程,sql和Monster查询,并尝试获取以下内容:

传入一个邮政编码/城市名,我会查询一下,从巨大的表中获取经纬度(可能是mysql中的函数),并给出给定的值,我会寻找最近的城镇并立即加入并在位置表中找到我最近的5个位置.

问题:

  • 如何避免同一城市/邮政编码的多个名称匹配?
  • 为了获得5个最接近的位置,是否可以通过简单的联接来做到这一点?

方法2:

获取我的位置的所有经/纬度值,然后在此表上运行该过程.只需使用巨大的表格即可检索我的当前位置?

这样,我将需要收集我位置的所有经/纬度信息.但这可能是最好的方法.

但是拥有所有城市/邮政编码的庞大数据库只是为了获取位置,这似乎有点过头了.我希望也许还有其他选择...

方法3

说实话,我想要的此功能似乎写了一百万遍.那么,为什么我要重新发明轮子呢?但是我不知道如何找到合适的文章或书籍来实现我的目标.

你们中的任何人是否有针对此类最佳实践的想法?

解决方案

第一个评论...

我已经在这里和其他论坛上看到了数十个(而不是数百万个)实施方案;你的比大多数人都好.

根据一个数据源(我刚好下载了),世界上大约有320万个城市.

为了提高性能,您需要避免检查所有3M行.随着边界框的增加,您已经有了一个良好的开端.请注意,您应该拥有

INDEX(lat, lon),
INDEX(lon, lat)

优化程序将在这些查询之间进行选择,并且第一个查询(使用COUNT(*))将其视为覆盖".它会是遍布全球的条纹或楔形;对3M行有明显的改进.最差的纬度(+34度)中有96K个城市. (1度= 69英里/111公里.)十分之一的度数是34.4,其中有1万个城市是最差的.

(是的,我喜欢这种数据难题.)

而且,我看到您处理了日期线和极点.我认为作为特殊情况,您无法改进它们.

(我只看了公式和常量.)

Geohash和Z顺序索引帮助.但是它们的麻烦之处在于,您需要检查目标周围的多达4个区域-就像没有意识到整数199999和200000确实彼此非常接近,尽管它们的第一个数字是不同的.

用户输入邮政编码或城市名"-这是对两个简单表之一的点查询. (除了可以有小玩意儿,圣荷西"和圣安东尼奥"各有320个以上.该列表中的第一个非西班牙名字是维多利亚",只有144个城市.)

第二,我的实现... (与您的实现有一些相似之处).

http://mysql.rjweb.org/doc.php/latlng

通过使用PARTITIONing将边界框减小到大约正方形而不是条纹或楔形,可以提高性能.如果您正在寻找最接近的5,我的算法很少会碰到几十行,并且这些行将被聚集"在少量的块中,从而使磁盘命中的数量非常低.

设计中的关键是将所有必要的列都放在一个表中.找到最接近的5点后,您可以转到其他桌子上以获取辅助物品(电话号码等).

对于邮政编码,请先将其转换为经/纬度,然后再开始搜索最近的5个邮政编码.

算法内部的联接很可能会破坏性能.

What I want:

  1. The user passes in a zip code or city name
  2. I search my database for the 5 closest locations
  3. Display the 5 closest locations near that position to the user

What I have so far:

Let's say a table of places with the following content:

(about 16000 rows)

CREATE TABLE `locations` (
 `locationID` int(11) NOT NULL AUTO_INCREMENT,
 `name` varchar(150) NOT NULL,
 `firstname` varchar(100) DEFAULT NULL,
 `lastname` varchar(100) DEFAULT NULL,
 `street` varchar(100) NOT NULL,
 `city` varchar(100) NOT NULL,
 `state` varchar(100) NOT NULL,
 `zipcode` varchar(10) NOT NULL,
 `phone` varchar(20) NOT NULL,
 `web` varchar(255) DEFAULT NULL,
 `machine` enum('Unbekannt','Foo','Bar') DEFAULT 'Unbekannt',
 `surface` enum('Unbekannt','Foo','Bar','') DEFAULT 'Unbekannt',
 PRIMARY KEY (`locationID`)
) ENGINE=InnoDB AUTO_INCREMENT=25 DEFAULT CHARSET=utf8

  1. ID
  2. name
  3. zip code
  4. city

Now I've got a second table with all towns of the world:

(about 3.4 million rows)

CREATE TABLE `geoData` (
 `geoID` int(11) NOT NULL AUTO_INCREMENT,
 `countryCode` char(2) NOT NULL,
 `zipCode` varchar(20) NOT NULL,
 `name` varchar(180) NOT NULL,
 `state` varchar(100) NOT NULL,
 `stateCode` varchar(20) NOT NULL,
 `county` varchar(100) NOT NULL,
 `countyCode` varchar(20) NOT NULL,
 `community` varchar(100) NOT NULL,
 `communityCode` varchar(20) NOT NULL,
 `lat` mediumint(6) NOT NULL,
 `lon` mediumint(6) NOT NULL,
 PRIMARY KEY (`lon`,`lat`,`geoID`) USING BTREE,
 KEY `geoID` (`geoID`)
) ENGINE=InnoDB AUTO_INCREMENT=16482 DEFAULT CHARSET=utf8
/*!50100 PARTITION BY RANGE (lat)
(PARTITION p0 VALUES LESS THAN (-880000) ENGINE = InnoDB,
PARTITION p1 VALUES LESS THAN (-860000) ENGINE = InnoDB,
PARTITION p2 VALUES LESS THAN (-840000) ENGINE = InnoDB,
PARTITION p3 VALUES LESS THAN (-820000) ENGINE = InnoDB,
PARTITION p4 VALUES LESS THAN (-800000) ENGINE = InnoDB,
PARTITION p5 VALUES LESS THAN (-780000) ENGINE = InnoDB,
PARTITION p6 VALUES LESS THAN (-760000) ENGINE = InnoDB,
PARTITION p7 VALUES LESS THAN (-740000) ENGINE = InnoDB,
PARTITION p8 VALUES LESS THAN (-720000) ENGINE = InnoDB,
PARTITION p9 VALUES LESS THAN (-700000) ENGINE = InnoDB,
PARTITION p10 VALUES LESS THAN (-680000) ENGINE = InnoDB,
PARTITION p11 VALUES LESS THAN (-660000) ENGINE = InnoDB,
PARTITION p12 VALUES LESS THAN (-640000) ENGINE = InnoDB,
PARTITION p13 VALUES LESS THAN (-620000) ENGINE = InnoDB,
PARTITION p14 VALUES LESS THAN (-600000) ENGINE = InnoDB,
PARTITION p15 VALUES LESS THAN (-580000) ENGINE = InnoDB,
PARTITION p16 VALUES LESS THAN (-560000) ENGINE = InnoDB,
PARTITION p17 VALUES LESS THAN (-540000) ENGINE = InnoDB,
PARTITION p18 VALUES LESS THAN (-520000) ENGINE = InnoDB,
PARTITION p19 VALUES LESS THAN (-500000) ENGINE = InnoDB,
PARTITION p20 VALUES LESS THAN (-480000) ENGINE = InnoDB,
PARTITION p21 VALUES LESS THAN (-460000) ENGINE = InnoDB,
PARTITION p22 VALUES LESS THAN (-440000) ENGINE = InnoDB,
PARTITION p23 VALUES LESS THAN (-420000) ENGINE = InnoDB,
PARTITION p24 VALUES LESS THAN (-400000) ENGINE = InnoDB,
PARTITION p25 VALUES LESS THAN (-380000) ENGINE = InnoDB,
PARTITION p26 VALUES LESS THAN (-360000) ENGINE = InnoDB,
PARTITION p27 VALUES LESS THAN (-340000) ENGINE = InnoDB,
PARTITION p28 VALUES LESS THAN (-320000) ENGINE = InnoDB,
PARTITION p29 VALUES LESS THAN (-300000) ENGINE = InnoDB,
PARTITION p30 VALUES LESS THAN (-280000) ENGINE = InnoDB,
PARTITION p31 VALUES LESS THAN (-260000) ENGINE = InnoDB,
PARTITION p32 VALUES LESS THAN (-240000) ENGINE = InnoDB,
PARTITION p33 VALUES LESS THAN (-220000) ENGINE = InnoDB,
PARTITION p34 VALUES LESS THAN (-200000) ENGINE = InnoDB,
PARTITION p35 VALUES LESS THAN (-180000) ENGINE = InnoDB,
PARTITION p36 VALUES LESS THAN (-160000) ENGINE = InnoDB,
PARTITION p37 VALUES LESS THAN (-140000) ENGINE = InnoDB,
PARTITION p38 VALUES LESS THAN (-120000) ENGINE = InnoDB,
PARTITION p39 VALUES LESS THAN (-100000) ENGINE = InnoDB,
PARTITION p40 VALUES LESS THAN (-80000) ENGINE = InnoDB,
PARTITION p41 VALUES LESS THAN (-60000) ENGINE = InnoDB,
PARTITION p42 VALUES LESS THAN (-40000) ENGINE = InnoDB,
PARTITION p43 VALUES LESS THAN (-20000) ENGINE = InnoDB,
PARTITION p44 VALUES LESS THAN (0) ENGINE = InnoDB,
PARTITION p45 VALUES LESS THAN (20000) ENGINE = InnoDB,
PARTITION p46 VALUES LESS THAN (40000) ENGINE = InnoDB,
PARTITION p47 VALUES LESS THAN (60000) ENGINE = InnoDB,
PARTITION p48 VALUES LESS THAN (80000) ENGINE = InnoDB,
PARTITION p49 VALUES LESS THAN (100000) ENGINE = InnoDB,
PARTITION p50 VALUES LESS THAN (120000) ENGINE = InnoDB,
PARTITION p51 VALUES LESS THAN (140000) ENGINE = InnoDB,
PARTITION p52 VALUES LESS THAN (160000) ENGINE = InnoDB,
PARTITION p53 VALUES LESS THAN (180000) ENGINE = InnoDB,
PARTITION p54 VALUES LESS THAN (200000) ENGINE = InnoDB,
PARTITION p55 VALUES LESS THAN (220000) ENGINE = InnoDB,
PARTITION p56 VALUES LESS THAN (240000) ENGINE = InnoDB,
PARTITION p57 VALUES LESS THAN (260000) ENGINE = InnoDB,
PARTITION p58 VALUES LESS THAN (280000) ENGINE = InnoDB,
PARTITION p59 VALUES LESS THAN (300000) ENGINE = InnoDB,
PARTITION p60 VALUES LESS THAN (320000) ENGINE = InnoDB,
PARTITION p61 VALUES LESS THAN (340000) ENGINE = InnoDB,
PARTITION p62 VALUES LESS THAN (360000) ENGINE = InnoDB,
PARTITION p63 VALUES LESS THAN (380000) ENGINE = InnoDB,
PARTITION p64 VALUES LESS THAN (400000) ENGINE = InnoDB,
PARTITION p65 VALUES LESS THAN (420000) ENGINE = InnoDB,
PARTITION p66 VALUES LESS THAN (440000) ENGINE = InnoDB,
PARTITION p67 VALUES LESS THAN (460000) ENGINE = InnoDB,
PARTITION p68 VALUES LESS THAN (480000) ENGINE = InnoDB,
PARTITION p69 VALUES LESS THAN (500000) ENGINE = InnoDB,
PARTITION p70 VALUES LESS THAN (520000) ENGINE = InnoDB,
PARTITION p71 VALUES LESS THAN (540000) ENGINE = InnoDB,
PARTITION p72 VALUES LESS THAN (560000) ENGINE = InnoDB,
PARTITION p73 VALUES LESS THAN (580000) ENGINE = InnoDB,
PARTITION p74 VALUES LESS THAN (600000) ENGINE = InnoDB,
PARTITION p75 VALUES LESS THAN (620000) ENGINE = InnoDB,
PARTITION p76 VALUES LESS THAN (640000) ENGINE = InnoDB,
PARTITION p77 VALUES LESS THAN (660000) ENGINE = InnoDB,
PARTITION p78 VALUES LESS THAN (680000) ENGINE = InnoDB,
PARTITION p79 VALUES LESS THAN (700000) ENGINE = InnoDB,
PARTITION p80 VALUES LESS THAN (720000) ENGINE = InnoDB,
PARTITION p81 VALUES LESS THAN (740000) ENGINE = InnoDB,
PARTITION p82 VALUES LESS THAN (760000) ENGINE = InnoDB,
PARTITION p83 VALUES LESS THAN (780000) ENGINE = InnoDB,
PARTITION p84 VALUES LESS THAN (800000) ENGINE = InnoDB,
PARTITION p85 VALUES LESS THAN (820000) ENGINE = InnoDB,
PARTITION p86 VALUES LESS THAN (840000) ENGINE = InnoDB,
PARTITION p87 VALUES LESS THAN (860000) ENGINE = InnoDB,
PARTITION p88 VALUES LESS THAN (880000) ENGINE = InnoDB,
PARTITION p89 VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */

  1. ID
  2. city
  3. zip code
  4. latitude
  5. longitude

based on this article and some other reading up on that matter I have a stored procedure which is giving me n locations/zip codes of the closest towns near a point (latitude/longitude).

My stored procedure:

    BEGIN
    DECLARE _deg2rad DOUBLE DEFAULT PI()/1800000;

    SET @my_lat := _my_lat,
        @my_lon := _my_lon,
        @deg2dist := 0.0111325,  
        @start_deg := _start_dist / @deg2dist,  
        @max_deg := _max_dist / @deg2dist,
        @cutoff := @max_deg / SQRT(2),  
        @dlat := @start_deg,  
        @lon2lat := COS(_deg2rad * @my_lat),
        @iterations := 0;        

    SET @sql = CONCAT(
        "SELECT COUNT(*) INTO @near_ct
            FROM geoData
            WHERE lat    BETWEEN @my_lat - @dlat
                             AND @my_lat + @dlat   
              AND lon    BETWEEN @my_lon - @dlon
                             AND @my_lon + @dlon");
    PREPARE _sql FROM @sql;
    MainLoop: LOOP
        SET @iterations := @iterations + 1;
        SET @dlon := ABS(@dlat / @lon2lat);  
        SET @dlon := IF(ABS(@my_lat) + @dlat >= 900000, 3600001, @dlon);  
        EXECUTE _sql;
        IF ( @near_ct >= _limit OR         
             @dlat >= @cutoff ) THEN       
            LEAVE MainLoop;
        END IF;
        SET @dlat := LEAST(2 * @dlat, @cutoff);   
    END LOOP MainLoop;
    DEALLOCATE PREPARE _sql;

    SET @dlat := IF( @dlat >= @max_deg OR @dlon >= 1800000,
                @max_deg,
                GCDist(ABS(@my_lat), @my_lon,
                       ABS(@my_lat) - @dlat, @my_lon - @dlon) );
    SET @dlon := IFNULL(ASIN(SIN(_deg2rad * @dlat) /
                             COS(_deg2rad * @my_lat))
                            / _deg2rad 
                        , 3600001);    


    IF (ABS(@my_lon) + @dlon < 1800000 OR    
        ABS(@my_lat) + @dlat <  900000) THEN 
        SET @sql = CONCAT(
            "SELECT *,
                    @deg2dist * GCDist(@my_lat, @my_lon, lat, lon) AS dist
                FROM geoData
                WHERE lat BETWEEN @my_lat - @dlat
                              AND @my_lat + @dlat   
                  AND lon BETWEEN @my_lon - @dlon
                              AND @my_lon + @dlon   
                HAVING dist <= ", _max_dist, "
                ORDER BY dist
                LIMIT ", _limit
                        );
    ELSE
        SET @west_lon := IF(@my_lon < 0, @my_lon, @my_lon - 3600000);
        SET @east_lon := @west_lon + 3600000;
        SET @sql = CONCAT(
            "( SELECT *,
                    @deg2dist * GCDist(@my_lat, @west_lon, lat, lon) AS dist
                FROM geoData
                WHERE lat BETWEEN @my_lat - @dlat
                              AND @my_lat + @dlat 
                  AND lon BETWEEN @west_lon - @dlon
                              AND @west_lon + @dlon   
                HAVING dist <= ", _max_dist, " )
            UNION ALL
            ( SELECT *,
                    @deg2dist * GCDist(@my_lat, @east_lon, lat, lon) AS dist
                FROM geoData
                WHERE lat BETWEEN @my_lat - @dlat
                              AND @my_lat + @dlat   
                  AND lon BETWEEN @east_lon - @dlon
                              AND @east_lon + @dlon   
                HAVING dist <= ", _max_dist, " )
            ORDER BY dist
            LIMIT ", _limit
                        );
    END IF;

    PREPARE _sql FROM @sql;
    EXECUTE _sql;
    DEALLOCATE PREPARE _sql;
END

My Problem:

I would like to pass in a zip code or name of a town and start my search from there. So my thought was I request this information and look up my table of all towns/zip codes from the world. After that, I have the information of lat/lon if only one result was found or I would ask the user to select the right choice in that case of having multiple results.

After that, I start searching for nearest towns close to my current position. Let's say I want a list of 50 towns/cities. And with that, I'd go and look up and see if the table containing the locations matches 5 results in there.

On second thought, this sounds like a bad idea...

Approach 1:

I read up on stored procedures, sql and monster queries and try to get the following:

Passing in a zip code/city name I would look that up, take my lat/lon from the huge table (possible as the function in mysql) and with that given I'd look for the nearest towns and join right then and there the locations table and get my 5 closest locations.

Questions:

  • How would I avoid having several matches for the same name of a city/zip code?
  • Does it sound possible to do so with a simple join in order to get the 5 closest locations?

Approach 2:

Get all the lat/lon values of my locations and then run the procedure on this table instead. And just use the huge table in order to retrieve my current position?

With that, I would need to gather all the lat/lon of my locations though. But it might be the best way.

But having the huge database of all cities/zip codes just to get the locations seems like a bit of an overkill. I would hope there is an alternative then maybe... somehow...

Approach 3

To be honest, this function I want seems like written a million times before. So why should I bother reinventing the wheel? But I have no clue how to find the right articles or books in order to accomplish my goal.

Has anyone else of you an idea for the best practice for something like that?

解决方案

First some comments...

I've seen dozens (not millions) of implementation here and on other forums; yours is better than most.

According to one data source (which I happen to have downloaded) there are about 3.2 million cities in the world.

For performance, you need to avoid checking all 3M rows. You have made a good start with the growing bounding box. Note that you should have

INDEX(lat, lon),
INDEX(lon, lat)

The Optimizer will choose between those and the first query (with the COUNT(*)) will see that as 'covering'. It will be a stripe around the globe or a wedge; a definite improvement over 3M rows. The worst latitude (+34 degrees) has 96K cities in it. (1 degree = 69 miles / 111 km.) For a tenth of a degree, 34.4 is the worst, with 10K cities.

(Yes, I enjoy this kind of data puzzle.)

And, I see that you handle the dateline and poles. I don't think you can improve on having them as a special case.

(I have only glanced at the formulas and constants.)

Geohash and Z-order indexing help. But they have a hiccup in that you need to check up to 4 areas around the target -- It's like not realizing that the integers 199999 and 200000 are really close to each other, in spite of the first digit of each is different.

"User passes in zip code or city name" -- that's a point query into one of two simple tables. (Except that there can be dups -- over 320 each of "san jose" and "san antonio". Pretty far down the list is the first non-Spanish name: "victoria", with only 144 cities.)

Second, my implementation... (It has some similarities to yours.)

http://mysql.rjweb.org/doc.php/latlng

This improves on performance by using PARTITIONing to keeping the bounding box down to roughly a square, instead of a stripe or wedge. If you are looking for the 5 nearest, my algorithm will rarely touch more than a few dozen rows, and those rows will be 'clustered' in a small number of blocks, thereby keeping the number of disk hits very low.

A critical thing in my design is to have all the necessary columns in the one table. Once you have found the nearest 5, you can go off to other tables to get ancillary things (phone number, etc).

As for zip codes, turn them into lat/lon before starting the search for the 5 nearest.

A join inside the algorithm is very likely to destroy the performance.

这篇关于搜索5个最接近邮政编码的位置-我应该怎么走?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆