带有 pandas 数据框的矢量化半正弦公式 [英] Vectorised Haversine formula with a pandas dataframe

查看:23
本文介绍了带有 pandas 数据框的矢量化半正弦公式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道要找到两个经纬度点之间的距离,我需要使用haversine函数:

I know that to find the distance between two latitude, longitude points I need to use the haversine function:

def haversine(lon1, lat1, lon2, lat2):
    lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
    dlon = lon2 - lon1 
    dlat = lat2 - lat1 
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
    c = 2 * asin(sqrt(a)) 
    km = 6367 * c
    return km

我有一个 DataFrame,其中一列是纬度,另一列是经度.我想找出这些点与设定点 -56.7213600、37.2175900 的距离.如何从 DataFrame 中获取值并将它们放入函数中?

I have a DataFrame where one column is latitude and another column is longitude. I want to find out how far these points are from a set point, -56.7213600, 37.2175900. How do I take the values from the DataFrame and put them into the function?

示例数据帧:

     SEAZ     LAT          LON
1    296.40,  58.7312210,  28.3774110  
2    274.72,  56.8148320,  31.2923240
3    192.25,  52.0649880,  35.8018640
4     34.34,  68.8188750,  67.1933670
5    271.05,  56.6699880,  31.6880620
6    131.88,  48.5546220,  49.7827730
7    350.71,  64.7742720,  31.3953780
8    214.44,  53.5192920,  33.8458560
9      1.46,  67.9433740,  38.4842520
10   273.55,  53.3437310,   4.4716664

推荐答案

我无法确认计算是否正确,但以下方法有效:

I can't confirm if the calculations are correct but the following worked:

In [11]:

from numpy import cos, sin, arcsin, sqrt
from math import radians

def haversine(row):
    lon1 = -56.7213600
    lat1 = 37.2175900
    lon2 = row['LON']
    lat2 = row['LAT']
    lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
    dlon = lon2 - lon1 
    dlat = lat2 - lat1 
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
    c = 2 * arcsin(sqrt(a)) 
    km = 6367 * c
    return km

df['distance'] = df.apply(lambda row: haversine(row), axis=1)
df
Out[11]:
         SEAZ        LAT        LON     distance
index                                           
1      296.40  58.731221  28.377411  6275.791920
2      274.72  56.814832  31.292324  6509.727368
3      192.25  52.064988  35.801864  6990.144378
4       34.34  68.818875  67.193367  7357.221846
5      271.05  56.669988  31.688062  6538.047542
6      131.88  48.554622  49.782773  8036.968198
7      350.71  64.774272  31.395378  6229.733699
8      214.44  53.519292  33.845856  6801.670843
9        1.46  67.943374  38.484252  6418.754323
10     273.55  53.343731   4.471666  4935.394528

以下代码在如此小的数据帧上实际上速度较慢,但​​我将其应用于 100,000 行 df:

The following code is actually slower on such a small dataframe but I applied it to a 100,000 row df:

In [35]:

%%timeit
df['LAT_rad'], df['LON_rad'] = np.radians(df['LAT']), np.radians(df['LON'])
df['dLON'] = df['LON_rad'] - math.radians(-56.7213600)
df['dLAT'] = df['LAT_rad'] - math.radians(37.2175900)
df['distance'] = 6367 * 2 * np.arcsin(np.sqrt(np.sin(df['dLAT']/2)**2 + math.cos(math.radians(37.2175900)) * np.cos(df['LAT_rad']) * np.sin(df['dLON']/2)**2))

1 loops, best of 3: 17.2 ms per loop

相比apply函数,它用了4.3s,快了将近250倍,以后要注意了

Compared to the apply function which took 4.3s so nearly 250 times quicker, something to note in the future

如果我们将以上所有内容压缩为一个单行:

If we compress all the above in to a one-liner:

In [39]:

%timeit df['distance'] = 6367 * 2 * np.arcsin(np.sqrt(np.sin((np.radians(df['LAT']) - math.radians(37.2175900))/2)**2 + math.cos(math.radians(37.2175900)) * np.cos(np.radians(df['LAT'])) * np.sin((np.radians(df['LON']) - math.radians(-56.7213600))/2)**2))
100 loops, best of 3: 12.6 ms per loop

现在我们观察到进一步的加速速度提高了约 341 倍.

We observe further speed ups now a factor of ~341 times quicker.

这篇关于带有 pandas 数据框的矢量化半正弦公式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆