Python:加速地理比较 [英] Python: speeding up geographic comparison

查看:15
本文介绍了Python:加速地理比较的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我编写了一些包含嵌套循环的代码,其中内循环执行了大约 150 万次.我正在尝试优化此循环中的一个函数.我已经做了一些工作,并取得了一些成果,但我需要一些输入来检查我所做的是否明智.

I've written some code that includes a nested loop where the inner loop is executed about 1.5 million times. I have a function in this loop that I'm trying to optimize. I've done some work, and got some results, but I need a little input to check if what I'm doing is sensible.

一些背景:

我有两个地理点集合(纬度、经度),一个相对较小的集合和一个相对较大的集合.对于小集合中的每个点,我需要找到大集合中最近的点.

I have two collections of geographic points (latitude, longitude), one relatively small collection and one relatively huge collection. For every point in the small collection, I need to find the closest point in the large collection.

最明显的方法是使用半正弦公式.这样做的好处是距离绝对准确.

The obvious way to do this would be to use the haversine formula. The benefit here is that the distances are definitely accurate.

from math import radians, sin, cos, asin, sqrt

def haversine(point1, point2):
    """Gives the distance between two points on earth.
    """
    earth_radius_miles = 3956
    lat1, lon1 = (radians(coord) for coord in point1)
    lat2, lon2 = (radians(coord) for coord in point2)
    dlat, dlon = (lat2 - lat1, lon2 - lon1)
    a = sin(dlat/2.0)**2 + cos(lat1) * cos(lat2) * sin(dlon/2.0)**2
    great_circle_distance = 2 * asin(min(1,sqrt(a)))
    d = earth_radius_miles * great_circle_distance
    return d

但是,在我的机器上运行这 150 万次大约需要 9 秒(根据 timeit).由于准确的距离并不重要,而我只需要找到最近的点,我决定尝试一些其他功能.

However, running this 1.5 million times takes about 9 seconds on my machine (according to timeit). Since having an accurate distance is unimportant, rather I only need to find the closest point, I decided to try some other functions.

勾股定理的一个简单实现使我的速度提高了大约 30%.想到自己可以做得更好,我写了以下内容:

A simple implementation of the pythagorean theorem gives me a speedup of about 30%. Thinking that I can do better, I wrote the following:

def dumb(point1, point2):
    lat1, lon1 = point1
    lat2, lon2 = point2
    d = abs((lat2 - lat1) + (lon2 - lon1))

这给了我 10 倍的改进.但是,现在我担心这不会保留三角不等式.

which gives me a factor of 10 improvement. However, now I'm worried that this will not preserve the triangle inequality.

所以,我的最后一个问题有两个方面:我想要一个运行速度与 dumb 一样快但仍然正确的函数.dumb 会起作用吗?如果没有,有关如何改进半正弦函数的任何建议?

So, my final question is two fold: I'd like to have a function that runs as fast as dumb but still be correct. Will dumb work? If not, any suggestions on how to improve my haversine function?

推荐答案

您可以考虑某种图形散列,即快速找到最近的点,然后对其进行计算.例如,您可以创建一个统一的网格,并将(大集合的)点分布在网格创建的 bin 中.

You can consider some kind of graphical hashing, i.e. find closest points fast and then calculate on them. For example, you can create a uniform grid, and distribute the points (of the large collection) to be in the bins created by the grid.

现在,有了小集合中的一个点,您需要处理的点数量要少得多(即仅在相关 bin 中的点)

Now, having a point from the small collection, you'll need to process much smaller amount of points (i.e. those in relevant bins only)

这篇关于Python:加速地理比较的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆