查询位置半径内的总点数 [英] SQL Query For Total Points Within Radius of a Location

查看:138
本文介绍了查询位置半径内的总点数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含美国所有邮编的数据库表,其中包括城市,州,纬度,每个邮政编码的经度。我也有一个点的数据库表,每个点都有一个纬度&与它们相关的经度。我希望能够使用1 MySQL查询为我提供zipcodes表中所有独特城市/州组合的列表,以及该城市/州的给定半径内的点总数。我可以使用以下查询获得唯一的城市/州列表:

 选择城市,州,纬度和经度
从邮编
按城市分组,按州,州列出州;

我可以获得特定城市100英里半径内的点数,纬度为$ lat '和longitude'$ lon'使用以下查询:

  select count(*)
from points $ b $ (弧度($纬度))* cos(弧度(纬度))* cos(弧度(经度) - 弧度($ lon))+ sin(弧度($ lat))* sin(弧度(纬度))))< 100;

我无法做的是弄清楚如何将这些查询结合起来不会杀死我的数据库。这是我的悲哀尝试之一:

pre $ select city,state,latitude,longitude,
(select count(* )从点
其中status =AAND
(cos(弧度(zipcodes.latitude))* cos(弧度(纬度))* cos(弧度(经度) - 弧度(弧度(弧度(纬度))))< 100)作为'点'
来自zipcodes
按城市排序,州排序由州,市;

这些表格目前有以下索引:

  Zipcodes  - `zip`(zip)
Zipcodes - `location`(州,市)
Points - `status_length_location`(状态,长度,经度,纬度)

当我在之前的MySQL查询之前运行解释程序时,输出是:

  + ---- + -------------------- + ----- ----- + ------ + ------------------------ + ------------ ------------ + --------- + ------- ------- + + ----------- ---------------------- + 
| id | select_type |表| |键入| possible_keys |键| key_len | ref |行|额外|
+ ---- + -------------------- + ---------- + ------ + - ----------------------- + ------------------------ + - -------- ------- + ------- + + ------------------------- -------- +
| 1 | PRIMARY |邮编| ALL | NULL | NULL | NULL | NULL | 43187 |使用临时;使用filesort |
| 2 |依赖子查询|点| ref | status_length_location | status_length_location | 2 | const | 16473 |在哪里使用;使用index |
+ ---- + -------------------- + ---------- + ------ + - ----------------------- + ------------------------ + - -------- ------- + ------- + + ------------------------- -------- +

我知道我可以遍历所有的邮编并计算给定半径内匹配点的数量,但点表会随时增长,我宁愿在zipcodes数据库中没有过时点总数。我希望有一位MySQL专家能够向我展示我的方式错误。提前感谢您的帮助!

解决方案

MySQL Guru与否,问题是除非您找到一种过滤各种方式行,距离需要在每个点和每个城市之间计算... ...

有两种一般方法可以帮助情况




  • 使得距离公式更简单

  • 过滤掉某个城市不太可能的100k半径候选人



在进入这两条改进之路之前,您应该确定所需的100英里距离的精度水平,同时您应该指出覆盖哪个地理区域(这是美国大陆的美国等等。)

其原因是尽管数字上更精确,但大圆圈公式的计算量非常大。性能改进将是存储网格坐标的各种增量(或中)代替)Lat / Long坐标。



编辑

有关更简单(但不太精确)公式

由于我们处理的是相对较小的距离(我猜测在30北纬和北纬48度之间),我们可以使用欧几里得距离(或更好的是欧几里得距离的平方)而不是更复杂的球形三角函数公式。根据预期的精度水平,
甚至可以接受一个单一参数用于完整度经度的线性距离,所考虑的区域(比如约46英里法规里程)。公式将会变成:

  LatDegInMi = 69.0 
LongDegInMi = 46.0
DistSquared =((Lat1 - Lat2 )* LatDegInMi)^ 2 +((Long1 - Long2)* LongDegInMi)^ 2

列的网格信息过滤以限制考虑用于距离计算的行数。

系统中的每个点(无论是城市还是其他点) ?交货地点,商店地点......任何地方)被分配了两个整数坐标,其定义了所在点的25英里* 25英里的平方。距参考点(给定城市)100英里内的任何点的坐标在x方向上最多为+/- 4,在y方向上最多为+/- 4。然后,我们可以编写类似于以下的查询:
$ b $ pre $ SELECT city,state,latitude,longitude,COUNT(*)
FROM邮编Z
JOIN点数P
ON P.GridX IN(
SELECT GridX - 4,GridX - 3,GridX - 2,GridX - 1,GridX,GridX + 1,GridX + 2 GridX + 3,GridX +4
FROM zipcode ZX WHERE Z.id = ZX.id)
AND
P.GridY IN(
SELECT GridY - 4,GridY - 3 ,GridY - 2,GridY - 1,GridY,GridY +1,GridY + 2 GridY + 3,GridY +4
FROM zipcode ZY WHERE Z.id = ZY.id)
WHERE P.Status = A
AND((Z.latitude - P.latitude)* LatDegInMi)^ 2
+((Z.longitude - P.longitude)* LongDegInMi)^ 2< (100 ^ 2)
GROUP BY城市,州,纬度,经度;

请注意,LongDegInMi可以进行硬编码(对于美国大陆的所有位置都是相同的),或者来自邮政编码表中的相应记录。同样,LatDegInMi可以被硬编码(很少需要使它变化,因为与其他编码相比,它是相对不变的)。



更快的原因是对于大多数记录在邮编表和点表之间的笛卡尔积中,我们根本不计算距离。我们根据索引值(GridX和GridY)消除它们。



这给我们带来了产生哪些SQL索引的问题。当然,我们可能想要:
- GridX + GridY +状态(在点表上)
- GridY + GridX +状态(可能)
- 城市+状态+纬度+经度+ GridX + gridY在邮政编码表上



网格的另一种选择是根据纬度和经度限制我们将考虑的纬度和经度极限一个给定的城市。即JOIN条件变成范围而不是IN:

 加入点P 
ON P.latitude> (Z.Latitude-(100 / LatDegInMi))
和P.latitude < (Z.Latitude +(100 / LatDegInMi))
和P.longitude> (Z.longitude - (100 / LongDegInMi))
和P.longitude < (Z.longitude +(100 / LongDegInMi))


I have a database table of all zipcodes in the US that includes city,state,latitude & longitude for each zipcode. I also have a database table of points that each have a latitude & longitude associated with them. I'd like to be able to use 1 MySQL query to provide me with a list of all unique city/state combinations from the zipcodes table with the total number of points within a given radius of that city/state. I can get the unique city/state list using the following query:

select city,state,latitude,longitude
from zipcodes 
group by city,state order by state,city;

I can get the number of points within a 100 mile radius of a specific city with latitude '$lat' and longitude '$lon' using the following query:

select count(*) 
from points 
where (3959 * acos(cos(radians($lat)) * cos(radians(latitude)) * cos(radians(longitude) - radians($lon)) + sin(radians($lat)) * sin(radians(latitude)))) < 100;

What I haven't been able to do is figure out how to combine these queries in a way that doesn't kill my database. Here is one of my sad attempts:

select city,state,latitude,longitude,
    (select count(*) from points
     where status="A" AND 
          (3959 * acos(cos(radians(zipcodes.latitude)) * cos(radians(latitude)) * cos(radians(longitude) - radians(zipcodes.longitude)) + sin(radians(zipcodes.latitude)) * sin(radians(latitude)))) < 100) as 'points' 
from zipcodes 
group by city,state order by state,city;

The tables currently have the following indexes:

Zipcodes - `zip` (zip)
Zipcodes - `location` (state,city)
Points - `status_length_location` (status,length,longitude,latitude)

When I run explain before the previous MySQL query here is the output:

+----+--------------------+----------+------+------------------------+------------------------+---------+-------+-------+---------------------------------+
| id | select_type        | table    | type | possible_keys          | key                    | key_len | ref   | rows  | Extra                           |
+----+--------------------+----------+------+------------------------+------------------------+---------+-------+-------+---------------------------------+
|  1 | PRIMARY            | zipcodes | ALL  | NULL                   | NULL                   | NULL    | NULL  | 43187 | Using temporary; Using filesort | 
|  2 | DEPENDENT SUBQUERY | points   | ref  | status_length_location | status_length_location | 2       | const | 16473 | Using where; Using index        | 
+----+--------------------+----------+------+------------------------+------------------------+---------+-------+-------+---------------------------------+

I know I could loop through all the zipcodes and calculate the number of matching points within a given radius but the points table will be growing all the time and I'd rather not have stale point totals in the zipcodes database. I'm hoping a MySQL guru out there can show me the error of my ways. Thanks in advance for your help!

解决方案

MySQL Guru or not, the problem is that unless you find a way of filtering out various rows, the distance needs to be calculated between each point and each city...

There are two general approaches that may help the situation

  • make the distance formula simpler
  • filter out unlikely candidates to the 100k radius from a given city

Before going into these two avenue of improvement, you should decide on the level of precision desired with regard to this 100 miles distance, also you should indicate which geographic area is covered by the database (is this just continental USA etc.

The reason for this is that while more precise numerically, the Great Circle formula, is very computationally expensive. Another avenue of performance improvement would be to store "Grid coordinates" of sorts in addtion (or instead of) the Lat/Long coordinates.

Edit:
A few ideas about a simpler (but less precise) formula:
Since we're dealing with relatively small distances, (and I'm guessing between 30 and 48 deg Lat North), we can use the euclidean distance (or better yet the square of the euclidean distance) rather than the more complicated spherical trigonometry formulas.
depending on the level of precision expected, it may even be acceptable to have one single parameter for the linear distance for a full degree of longitude, taking something average over the area considered (say circa 46 statute miles). The formula would then become

  LatDegInMi = 69.0
  LongDegInMi = 46.0
  DistSquared = ((Lat1 - Lat2) * LatDegInMi) ^2 + ((Long1 - Long2) * LongDegInMi) ^2

On the idea of a columns with grid info to filter to limit the number of rows considered for distance calculation.
Each "point" in the system, be it a city, or another point (?delivery locations, store locations... whatever) is assigned two integer coordinate which define the square of say 25 miles * 25 miles where the point lies. The coordinates of any point within 100 miles from the reference point (a given city), will be at most +/- 4 in the x direction and +/- 4 in the y direction. We can then write a query similar to the following

SELECT city, state, latitude, longitude, COUNT(*)
FROM zipcodes Z
JOIN points P 
  ON P.GridX IN (
    SELECT GridX - 4, GridX - 3, GridX - 2, GridX - 1, GridX, GridX +1, GridX + 2 GridX + 3, GridX +4
   FROM zipcode ZX WHERE Z.id = ZX.id)
  AND
   P.GridY IN (
    SELECT GridY - 4, GridY - 3, GridY - 2, GridY - 1, GridY, GridY +1, GridY + 2 GridY + 3, GridY +4
   FROM zipcode ZY WHERE Z.id = ZY.id)
WHERE P.Status = A
   AND ((Z.latitude - P.latitude) * LatDegInMi) ^2 
      + ((Z.longitude - P.longitude) * LongDegInMi) ^2 < (100^2)
GROUP BY city,state,latitude,longitude;

Note that the LongDegInMi could either be hardcoded (same for all locations within continental USA), or come from corresponding record in the zipcodes table. Similarly, LatDegInMi could be hardcoded (little need to make it vary, as unlike the other it is relatively constant).

The reason why this is faster is that for most records in the cartesian product between the zipcodes table and the points table, we do not calculate the distance at all. We eliminate them on the basis of a index value (the GridX and GridY).

This brings us to the question of which SQL indexes to produce. For sure, we may want: - GridX + GridY + Status (on the points table) - GridY + GridX + status (possibly) - City + State + latitude + longitude + GridX + GridY on the zipcodes table

An alternative to the grids is to "bound" the limits of latitude and longitude which we'll consider, based on the the latitude and longitude of the a given city. i.e. the JOIN condition becomes a range rather than an IN :

JOIN points P 
  ON    P.latitude > (Z.Latitude - (100 / LatDegInMi)) 
    AND P.latitude < (Z.Latitude + (100 / LatDegInMi)) 
    AND P.longitude > (Z.longitude - (100 / LongDegInMi)) 
    AND P.longitude < (Z.longitude + (100 / LongDegInMi)) 

这篇关于查询位置半径内的总点数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆