计算邮政编码...和用户之间的距离. [英] Calculate distance between Zip Codes... AND users.

查看:29
本文介绍了计算邮政编码...和用户之间的距离.的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这比我迫切需要的问题更具有挑战性,所以伙计们不要花一整天的时间.

我在 2000 年左右建立了一个约会网站(早已不复存在),其中一个挑战是计算用户之间的距离,以便我们可以在 X 英里半径内展示您的匹配".仅说明问题,给定以下数据库架构(大致):

用户表用户身份用户名邮政编码

邮政编码表邮政编码纬度经度

在 USER.ZipCode = ZIPCODE.ZipCode 上加入 USER 和 ZIPCODE.

您会采取什么方法来回答以下问题:在给定用户邮政编码 X 英里范围内的邮政编码中还有哪些其他用户.

我们使用了 将用作例子.PostGIS 包括执行 R 树空间索引的能力,使您可以进行高效的空间查询.

导入数据并构建空间索引后,查询距离的方式如下:

SELECT zip从邮政编码在哪里几何&&展开(转换(PointFromText('POINT(-116.768347 33.911404)',4269),32661),16093)和距离(变换(PointFromText('点(-116.768347 33.911404)',4269),32661),几何)<16093

我会让你自己完成教程的其余部分.

这里有一些其他参考资料可以帮助您入门.

This is more of a challenge question than something I urgently need, so don't spend all day on it guys.

I built a dating site (long gone) back in 2000 or so, and one of the challenges was calculating the distance between users so we could present your "matches" within an X mile radius. To just state the problem, given the following database schema (roughly):

USER TABLE UserId UserName ZipCode

ZIPCODE TABLE ZipCode Latitude Longitude

With USER and ZIPCODE being joined on USER.ZipCode = ZIPCODE.ZipCode.

What approach would you take to answer the following question: What other users live in Zip Codes that are within X miles of a given user's Zip Code.

We used the 2000 census data, which has tables for zip codes and their approximate lattitude and longitude.

We also used the Haversine Formula to calculate distances between any two points on a sphere... pretty simple math really.

The question, at least for us, being the 19 year old college students we were, really became how to efficiently calculate and/store distances from all members to all other members. One approach (the one we used) would be to import all the data and calculate the distance FROM every zip code TO every other zip code. Then you'd store and index the results. Something like:

SELECT  User.UserId
FROM    ZipCode AS MyZipCode
        INNER JOIN ZipDistance ON MyZipCode.ZipCode = ZipDistance.MyZipCode
        INNER JOIN ZipCode AS TheirZipCode ON ZipDistance.OtherZipCode = TheirZipCode.ZipCode
        INNER JOIN User AS User ON TheirZipCode.ZipCode = User.ZipCode
WHERE   ( MyZipCode.ZipCode = 75044 )
        AND ( ZipDistance.Distance < 50 )

The problem, of course, is that the ZipDistance table is going to have a LOT of rows in it. It isn't completely unworkable, but it is really big. Also it requires complete pre-work on the whole data set, which is also not unmanageable, but not necessarily desireable.

Anyway, I was wondering what approach some of you gurus might take on something like this. Also, I think this is a common issue programmers have to tackle from time to time, especially if you consider problems that are just algorithmically similar. I'm interested in a thorough solution that includes at least HINTS on all the pieces to do this really quickly end efficiently. Thanks!

解决方案

Ok, for starters, you don't really need to use the Haversine formula here. For large distances where a less accurate formula produces a larger error, your users don't care if the match is plus or minus a few miles, and for closer distances, the error is very small. There are easier (to calculate) formulas listed on the Geographical Distance Wikipedia article.

Since zip codes are nothing like evenly spaced, any process that partitions them evenly is going to suffer mightily in areas where they are clustered tightly (east coast near DC being a good example). If you want a visual comparison, check out http://benfry.com/zipdecode and compare the zipcode prefix 89 with 07.

A far better way to deal with indexing this space is to use a data structure like a Quadtree or an R-tree. This structure allows you to do spatial and distance searches over data which is not evenly spaced.

Here's what an Quadtree looks like:

To search over it, you drill down through each larger cell using the index of smaller cells that are within it. Wikipedia explains it more thoroughly.

Of course, since this is a fairly common thing to do, someone else has already done the hard part for you. Since you haven't specified what database you're using, the PostgreSQL extension PostGIS will serve as an example. PostGIS includes the ability to do R-tree spatial indexes which allow you to do efficient spatial querying.

Once you've imported your data and built the spatial index, querying for distance is a query like:

SELECT zip
FROM zipcode
WHERE
geom && expand(transform(PointFromText('POINT(-116.768347 33.911404)', 4269),32661), 16093)
AND
distance(
   transform(PointFromText('POINT(-116.768347 33.911404)', 4269),32661),
   geom) < 16093

I'll let you work through the rest of the tutorial yourself.

Here are some other references to get you started.

这篇关于计算邮政编码...和用户之间的距离.的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆