Python的 - 如何加快城市之间的距离计算 [英] Python - how to speed up calculation of distances between cities

查看:813
本文介绍了Python的 - 如何加快城市之间的距离计算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有55249个城市在我的数据库。每个人都能做出了经纬度值。 对于每一个城市,我想计算距离的每一个其他城市和存储那些没有比30公里的。这是我的算法:

I have 55249 cities in my database. Every single one has got latitude longitude values. For every city I want to calculate distances to every other city and store those that are no further than 30km. Here is my algorithm:

# distance function
from math import sin, cos, sqrt, atan2, radians

def distance(obj1, obj2):
    lat1 = radians(obj1.latitude)
    lon1 = radians(obj1.longitude)
    lat2 = radians(obj2.latitude)
    lon2 = radians(obj2.longitude)
    dlon = lon2 - lon1
    dlat = lat2 - lat1
    a = (sin(dlat/2))**2 + cos(lat1) * cos(lat2) * (sin(dlon/2))**2
    c = 2 * atan2(sqrt(a), sqrt(1-a))
    return round(6373.0 * c, 2)

def distances():
    cities = City.objects.all()  # I am using Django ORM
    for city in cities:
        closest = list()
        for tested_city in cities:
            distance = distance(city, tested_city)
            if distance <= 30. and distance != 0.:
                closest.append(tested_city)
        city.closest_cities.add(*closest)  # again, Django thing
        city.save()  # Django

这工作,但需要非常多的时间。还送个星期才能完成。什么办法可以加快步伐?

This works but takes awful lot of time. Gonna take weeks to complete. Any way I could speed it up?

推荐答案

您不能计算出每对城市之间的距离。相反,你需要把你的城市一空间分区的数据结构,而您可以快速近邻查询。 SciPy的配有 KD 的 - 树实施,<一个href="http://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.KDTree.html#scipy.spatial.KDTree"><$c$c>scipy.spatial.KDTree,适合于这种应用。

You can't afford to compute the distance between every pair of cities. Instead, you need to put your cities in a space-partitioning data structure for which you can make fast nearest-neighbour queries. SciPy comes with a kd-tree implementation, scipy.spatial.KDTree, that is suitable for this application.

这里有两个难点。首先,<一个href="http://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.KDTree.html#scipy.spatial.KDTree"><$c$c>scipy.spatial.KDTree使用点之间的欧氏距离,但你要使用的大圆距离沿地球表面。第二,经度环绕,使得最近邻居可能具有相差360°经度。这两个问题都可以,如果你采取以下方法来解决:

There are two difficulties here. First, scipy.spatial.KDTree uses Euclidean distance between points, but you want to use the great circle distance along the surface of the Earth. Second, longitude wraps around, so that nearest neighbours might have longitudes that differ by 360°. Both problems can be solved if you take the following approach:

  1. 将您的位置从大地坐标纬度经度的),以 ECEF (地球为中心,地球固定)坐标( X 的,以Z 的)。

  1. Convert your locations from geodetic coordinates (latitude, longitude) to ECEF (Earth-Centered, Earth-Fixed) coordinates (x, y, z).

把这些ECEF坐标转换成<一个href="http://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.KDTree.html#scipy.spatial.KDTree"><$c$c>scipy.spatial.KDTree.

Put these ECEF coordinates into the scipy.spatial.KDTree.

将您的大圆距离(例如30公里)到欧氏距离。

Convert your great circle distance (for example, 30 km) into a Euclidean distance.

呼叫<一href="http://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.KDTree.query_ball_point.html#scipy.spatial.KDTree.query_ball_point"><$c$c>scipy.spatial.KDTree.query_ball_point让范围内的城市。

Call scipy.spatial.KDTree.query_ball_point to get the cities within range.

下面是一些例子code来说明这种方法。函数 geodetic2ecef 来自的 PySatel大卫Parunakian 并在GPL许可。

Here's some example code to illustrate this approach. The function geodetic2ecef comes from PySatel by David Parunakian and is licensed under the GPL.

from math import radians, cos, sin, sqrt

# Constants defined by the World Geodetic System 1984 (WGS84)
A = 6378.137
B = 6356.7523142
ESQ = 6.69437999014 * 0.001

def geodetic2ecef(lat, lon, alt=0):
    """Convert geodetic coordinates to ECEF."""
    lat, lon = radians(lat), radians(lon)
    xi = sqrt(1 - ESQ * sin(lat))
    x = (A / xi + alt) * cos(lat) * cos(lon)
    y = (A / xi + alt) * cos(lat) * sin(lon)
    z = (A / xi * (1 - ESQ) + alt) * sin(lat)
    return x, y, z

def euclidean_distance(distance):
    """Return the approximate Euclidean distance corresponding to the
    given great circle distance (in km).

    """
    return 2 * A * sin(distance / (2 * B))

让我们做了五万随机地城的位置,并将其转换为ECEF坐标:

Let's make up fifty thousand random city locations and convert them to ECEF coordinates:

>>> from random import uniform
>>> cities = [(uniform(-90, 90), uniform(0, 360)) for _ in range(50000)]
>>> ecef_cities = [geodetic2ecef(lat, lon) for lat, lon in cities]

把它们放入一个<一个href="http://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.KDTree.html#scipy.spatial.KDTree"><$c$c>scipy.spatial.KDTree:

>>> import numpy
>>> from scipy.spatial import KDTree
>>> tree = KDTree(numpy.array(ecef_cities))

查找100公里的伦敦之内的所有城市:

Find all cities within about 100 km of London:

>>> london = geodetic2ecef(51, 0)
>>> tree.query_ball_point([london], r=euclidean_distance(100))
array([[37810, 15755, 16276]], dtype=object)

这个数组包含,为您查询,数组城市的距离研究中的每一个点。每个邻居给它的,你传递给 KDTree 原数组的索引。因此,有100公里的伦敦之内16276原始列表三个城市,即以指数37810,15755城市和:

This array contains, for each point that you queried, an array the cities within the distance r. Each neighbour is given as its index in the original array that you passed to KDTree. So there are three cities within about 100 km of London, namely the cities with indexes 37810, 15755, and 16276 in the original list:

>>> from pprint import pprint
>>> pprint([cities[i] for i in [37810, 15755, 16276]])
[(51.7186871990946, 359.8043453670437),
 (50.82734317063884, 1.1422052710187103),
 (50.95466110717763, 0.8956257749604779)]

注:

  1. 您可以从与邻国的经度相差约360°的正确发现的例子输出中看到。

  1. You can see from the example output that neighbours with longitudes that differ by about 360° are correctly discovered.

这种方法似乎不够快。在这里,我们找到的第一个一千个城市30公里范围内的邻居,以约5秒:

The approach seems fast enough. Here we find neighbours within 30 km for the first thousand cities, taking about 5 seconds:

>>> from timeit import timeit
>>> timeit(lambda:tree.query_ball_point(ecef_cities[:1000], r=euclidean_distance(30)), number=1)
5.013611573027447

推断,我们期望在四分钟左右30公里范围内的邻居对所有50000个城市。

Extrapolating, we expect to find neighbours within 30 km for all 50,000 cities in about four minutes.

我的 euclidean_distance 函数高估与给定大圆距离欧氏距离(以免错过任何一个城市)。这可能是够用了一些应用程序 - 毕竟,城市不是点对象,但如果你需要更多的精度比这一点,那么你可以筛选使用,比方说,从的 geopy

My euclidean_distance function overestimates the Euclidean distance corresponding to a given great circle distance (so as not to miss any cities). This might be good enough for some applications—after all, cities are not point objects—but if you need more accuracy than this, then you could filter the resulting points using, say, one of the great circle distance functions from geopy.

这篇关于Python的 - 如何加快城市之间的距离计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆