计算邮政编码与用户之间的距离。 [英] Calculate distance between Zip Codes... AND users.

查看:157
本文介绍了计算邮政编码与用户之间的距离。的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这不是我紧急需要的问题,更不是一个挑战性的问题,所以不要一整天都花在那些家伙上。



我在2000年左右建立了一个约会网站(早已消失),其中一项挑战是计算用户之间的距离,以便我们可以在X英里半径内显示您的匹配项。为了大致说明问题,大致给出以下数据库模式:



USER TABLE
UserId
UserName
ZipCode



邮政编码表
邮政编码
纬度
经度



在已加入USER和ZIPCODE的情况下



您将采用什么方法来回答以下问题:在给定X英里内的邮政编码中还存在哪些其他用户?用户的邮政编码。



我们使用了将用作例。 PostGIS包括执行R树空间索引的功能,使您可以进行有效的空间查询。



一旦导入数据并构建了空间索引,就可以查询距离是一个类似的查询:

 选择邮政编码
从邮政编码
哪里
geom& & expand(transform(PointFromText('POINT(-116.768347 33.911404)',4269),32661),16093)

距离(
transform(PointFromText('POINT(-116.768347 33.911404)' ,4269),32661),
geom)< 16093

我将让您自己完成本教程的其余部分。





还有一些其他参考可以帮助您入门。




This is more of a challenge question than something I urgently need, so don't spend all day on it guys.

I built a dating site (long gone) back in 2000 or so, and one of the challenges was calculating the distance between users so we could present your "matches" within an X mile radius. To just state the problem, given the following database schema (roughly):

USER TABLE UserId UserName ZipCode

ZIPCODE TABLE ZipCode Latitude Longitude

With USER and ZIPCODE being joined on USER.ZipCode = ZIPCODE.ZipCode.

What approach would you take to answer the following question: What other users live in Zip Codes that are within X miles of a given user's Zip Code.

We used the 2000 census data, which has tables for zip codes and their approximate lattitude and longitude.

We also used the Haversine Formula to calculate distances between any two points on a sphere... pretty simple math really.

The question, at least for us, being the 19 year old college students we were, really became how to efficiently calculate and/store distances from all members to all other members. One approach (the one we used) would be to import all the data and calculate the distance FROM every zip code TO every other zip code. Then you'd store and index the results. Something like:

SELECT  User.UserId
FROM    ZipCode AS MyZipCode
        INNER JOIN ZipDistance ON MyZipCode.ZipCode = ZipDistance.MyZipCode
        INNER JOIN ZipCode AS TheirZipCode ON ZipDistance.OtherZipCode = TheirZipCode.ZipCode
        INNER JOIN User AS User ON TheirZipCode.ZipCode = User.ZipCode
WHERE   ( MyZipCode.ZipCode = 75044 )
        AND ( ZipDistance.Distance < 50 )

The problem, of course, is that the ZipDistance table is going to have a LOT of rows in it. It isn't completely unworkable, but it is really big. Also it requires complete pre-work on the whole data set, which is also not unmanageable, but not necessarily desireable.

Anyway, I was wondering what approach some of you gurus might take on something like this. Also, I think this is a common issue programmers have to tackle from time to time, especially if you consider problems that are just algorithmically similar. I'm interested in a thorough solution that includes at least HINTS on all the pieces to do this really quickly end efficiently. Thanks!

解决方案

Ok, for starters, you don't really need to use the Haversine formula here. For large distances where a less accurate formula produces a larger error, your users don't care if the match is plus or minus a few miles, and for closer distances, the error is very small. There are easier (to calculate) formulas listed on the Geographical Distance Wikipedia article.

Since zip codes are nothing like evenly spaced, any process that partitions them evenly is going to suffer mightily in areas where they are clustered tightly (east coast near DC being a good example). If you want a visual comparison, check out http://benfry.com/zipdecode and compare the zipcode prefix 89 with 07.

A far better way to deal with indexing this space is to use a data structure like a Quadtree or an R-tree. This structure allows you to do spatial and distance searches over data which is not evenly spaced.

Here's what an Quadtree looks like:

To search over it, you drill down through each larger cell using the index of smaller cells that are within it. Wikipedia explains it more thoroughly.

Of course, since this is a fairly common thing to do, someone else has already done the hard part for you. Since you haven't specified what database you're using, the PostgreSQL extension PostGIS will serve as an example. PostGIS includes the ability to do R-tree spatial indexes which allow you to do efficient spatial querying.

Once you've imported your data and built the spatial index, querying for distance is a query like:

SELECT zip
FROM zipcode
WHERE
geom && expand(transform(PointFromText('POINT(-116.768347 33.911404)', 4269),32661), 16093)
AND
distance(
   transform(PointFromText('POINT(-116.768347 33.911404)', 4269),32661),
   geom) < 16093

I'll let you work through the rest of the tutorial yourself.

Here are some other references to get you started.

这篇关于计算邮政编码与用户之间的距离。的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆