在Python中向量化Haversine距离计算 [英] Vectorizing Haversine distance calculation in Python
问题描述
我正在尝试为由Latitude& amp;标识的一长串位置计算距离矩阵.使用 Haversine 公式,该公式采用两个坐标对的元组来产生距离:
I am trying to calculate a distance matrix for a long list of locations identified by Latitude & Longitude using the Haversine formula that takes two tuples of coordinate pairs to produce the distance:
def haversine(point1, point2, miles=False):
""" Calculate the great-circle distance bewteen two points on the Earth surface.
:input: two 2-tuples, containing the latitude and longitude of each point
in decimal degrees.
Example: haversine((45.7597, 4.8422), (48.8567, 2.3508))
:output: Returns the distance bewteen the two points.
The default unit is kilometers. Miles can be returned
if the ``miles`` parameter is set to True.
"""
我可以使用嵌套的for循环来计算所有点之间的距离,如下所示:
I can calculate the distance between all points using a nested for loop as follows:
data.head()
id coordinates
0 1 (16.3457688674, 6.30354512503)
1 2 (12.494749307, 28.6263955635)
2 3 (27.794615136, 60.0324947881)
3 4 (44.4269923769, 110.114216113)
4 5 (-69.8540884125, 87.9468778773)
使用一个简单的功能:
distance = {}
def haver_loop(df):
for i, point1 in df.iterrows():
distance[i] = []
for j, point2 in df.iterrows():
distance[i].append(haversine(point1.coordinates, point2.coordinates))
return pd.DataFrame.from_dict(distance, orient='index')
但是考虑到时间的复杂性,这需要花费相当长的时间,大约需要20秒才能获得500点,而且我的清单要长得多.这让我着眼于向量化,并且遇到了numpy.vectorize
((文档),但无法弄清楚如何在这种情况下应用它.
But this takes quite a while given the time complexity, running at around 20s for 500 points and I have a much longer list. This has me looking at vectorization, and I've come across numpy.vectorize
((docs), but can't figure out how to apply it in this context.
推荐答案
您可以将函数用作np.vectorize()
的参数,然后可以将其用作pandas.groupby.apply
的参数,如下所示:
You would provide your function as an argument to np.vectorize()
, and could then use it as an argument to pandas.groupby.apply
as illustrated below:
haver_vec = np.vectorize(haversine, otypes=[np.int16])
distance = df.groupby('id').apply(lambda x: pd.Series(haver_vec(df.coordinates, x.coordinates)))
例如,示例数据如下:
length = 500
df = pd.DataFrame({'id':np.arange(length), 'coordinates':tuple(zip(np.random.uniform(-90, 90, length), np.random.uniform(-180, 180, length)))})
比较500点:
def haver_vect(data):
distance = data.groupby('id').apply(lambda x: pd.Series(haver_vec(data.coordinates, x.coordinates)))
return distance
%timeit haver_loop(df): 1 loops, best of 3: 35.5 s per loop
%timeit haver_vect(df): 1 loops, best of 3: 593 ms per loop
这篇关于在Python中向量化Haversine距离计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!