Python:加快地理比较 [英] Python: speeding up geographic comparison

查看:88
本文介绍了Python:加快地理比较的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写了一些代码,其中包括一个嵌套循环,其中内部循环执行了约150万次.我在此循环中有一个函数正在尝试优化.我已经做了一些工作,并得到了一些结果,但是我需要一点输入来检查我在做的事情是否明智.

I've written some code that includes a nested loop where the inner loop is executed about 1.5 million times. I have a function in this loop that I'm trying to optimize. I've done some work, and got some results, but I need a little input to check if what I'm doing is sensible.

一些背景:

我有两个地理点(纬度,经度)集合,一个相对较小的集合,一个相对较大的集合.对于小型集合中的每个点,我需要找到大型集合中的最近点.

I have two collections of geographic points (latitude, longitude), one relatively small collection and one relatively huge collection. For every point in the small collection, I need to find the closest point in the large collection.

执行此操作的明显方法是使用Haversine公式.这样做的好处是距离绝对准确.

The obvious way to do this would be to use the haversine formula. The benefit here is that the distances are definitely accurate.

from math import radians, sin, cos, asin, sqrt

def haversine(point1, point2):
    """Gives the distance between two points on earth.
    """
    earth_radius_miles = 3956
    lat1, lon1 = (radians(coord) for coord in point1)
    lat2, lon2 = (radians(coord) for coord in point2)
    dlat, dlon = (lat2 - lat1, lon2 - lon1)
    a = sin(dlat/2.0)**2 + cos(lat1) * cos(lat2) * sin(dlon/2.0)**2
    great_circle_distance = 2 * asin(min(1,sqrt(a)))
    d = earth_radius_miles * great_circle_distance
    return d

但是,在我的机器上运行150万次大约需要9秒钟(根据timeit).由于准确的距离并不重要,而我只需要找到最接近的点,我决定尝试其他一些功能.

However, running this 1.5 million times takes about 9 seconds on my machine (according to timeit). Since having an accurate distance is unimportant, rather I only need to find the closest point, I decided to try some other functions.

毕达哥拉斯定理的简单实现使我的速度提高了约30%.考虑到我可以做得更好,我写了以下内容:

A simple implementation of the pythagorean theorem gives me a speedup of about 30%. Thinking that I can do better, I wrote the following:

def dumb(point1, point2):
    lat1, lon1 = point1
    lat2, lon2 = point2
    d = abs((lat2 - lat1) + (lon2 - lon1))

这使我提高了10倍.但是,现在我担心这将无法保留三角形不等式.

which gives me a factor of 10 improvement. However, now I'm worried that this will not preserve the triangle inequality.

所以,我的最后一个问题是两个方面:我想拥有一个运行速度与dumb一样快但仍然正确的函数. dumb可以工作吗?如果没有,关于如何改善我的haversine功能有什么建议吗?

So, my final question is two fold: I'd like to have a function that runs as fast as dumb but still be correct. Will dumb work? If not, any suggestions on how to improve my haversine function?

推荐答案

您可以考虑使用某种图形哈希,即快速找到最接近的点,然后对其进行计算.例如,您可以创建一个统一的网格,然后将(大集合的)点分布在该网格创建的容器中.

You can consider some kind of graphical hashing, i.e. find closest points fast and then calculate on them. For example, you can create a uniform grid, and distribute the points (of the large collection) to be in the bins created by the grid.

现在,从小型集合中获得一个点,您将需要处理数量少得多的点(即,仅在相关箱中的点)

Now, having a point from the small collection, you'll need to process much smaller amount of points (i.e. those in relevant bins only)

这篇关于Python:加快地理比较的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆