pandas 经纬度到连续行之间的距离 [英] Pandas Latitude-Longitude to distance between successive rows
问题描述
我在Python 2.7的Pandas DataFrame中具有以下内容:
I have the following in a Pandas DataFrame in Python 2.7:
Ser_Numb LAT LONG
1 74.166061 30.512811
2 72.249672 33.427724
3 67.499828 37.937264
4 84.253715 69.328767
5 72.104828 33.823462
6 63.989462 51.918173
7 80.209112 33.530778
8 68.954132 35.981256
9 83.378214 40.619652
10 68.778571 6.607066
我正在寻找计算数据帧中连续行之间的距离.输出应如下所示:
I am looking to calculate the distance between successive rows in the dataframe. The output should look something like this:
Ser_Numb LAT LONG Distance
1 74.166061 30.512811 0
2 72.249672 33.427724 d_between_Ser_Numb2 and Ser_Numb1
3 67.499828 37.937264 d_between_Ser_Numb3 and Ser_Numb2
4 84.253715 69.328767 d_between_Ser_Numb4 and Ser_Numb3
5 72.104828 33.823462 d_between_Ser_Numb5 and Ser_Numb4
6 63.989462 51.918173 d_between_Ser_Numb6 and Ser_Numb5
7 80.209112 33.530778 .
8 68.954132 35.981256 .
9 83.378214 40.619652 .
10 68.778571 6.607066 .
尝试
这篇文章看起来有些相似,但是它正在计算固定点之间的距离.我需要连续点之间的距离.
This post looks somewhat similar but it is calculating the distance between fixed points. I need the distance between successive points.
我尝试如下进行调整:
df['LAT_rad'], df['LON_rad'] = np.radians(df['LAT']), np.radians(df['LONG'])
df['dLON'] = df['LON_rad'] - np.radians(df['LON_rad'].shift(1))
df['dLAT'] = df['LAT_rad'] - np.radians(df['LAT_rad'].shift(1))
df['distance'] = 6367 * 2 * np.arcsin(np.sqrt(np.sin(df['dLAT']/2)**2 + math.cos(df['LAT_rad'].astype(float).shift(-1)) * np.cos(df['LAT_rad']) * np.sin(df['dLON']/2)**2))
但是,出现以下错误:
Traceback (most recent call last):
File "C:\Python27\test.py", line 115, in <module>
df['distance'] = 6367 * 2 * np.arcsin(np.sqrt(np.sin(df['dLAT']/2)**2 + math.cos(df['LAT_rad'].astype(float).shift(-1)) * np.cos(df['LAT_rad']) * np.sin(df['dLON']/2)**2))
File "C:\Python27\lib\site-packages\pandas\core\series.py", line 78, in wrapper
"{0}".format(str(converter)))
TypeError: cannot convert the series to <type 'float'>
[Finished in 2.3s with exit code 1]
此错误已通过MaxU的注释修复.修复后,此计算的输出没有意义-距离接近8000 km:
This error was fixed from MaxU's comment. With the fix, the output of this calculation is not making sense - the distance is nearly 8000 km:
Ser_Numb LAT LONG LAT_rad LON_rad dLON dLAT distance
0 1 74.166061 30.512811 1.294442 0.532549 NaN NaN NaN
1 2 72.249672 33.427724 1.260995 0.583424 0.574129 1.238402 8010.487211
2 3 67.499828 37.937264 1.178094 0.662130 0.651947 1.156086 7415.364469
3 4 84.253715 69.328767 1.470505 1.210015 1.198459 1.449943 9357.184623
4 5 72.104828 33.823462 1.258467 0.590331 0.569212 1.232802 7992.087820
5 6 63.989462 51.918173 1.116827 0.906143 0.895840 1.094862 7169.812123
6 7 80.209112 33.530778 1.399913 0.585222 0.569407 1.380421 8851.558260
7 8 68.954132 35.981256 1.203477 0.627991 0.617777 1.179044 7559.609520
8 9 83.378214 40.619652 1.455224 0.708947 0.697986 1.434220 9194.371978
9 10 68.778571 6.607066 1.200413 0.115315 0.102942 1.175014 NaN
根据:
- 此在线计算器:如果我使用的是Latitude1 = 74.166061, 经度1 = 30.512811,纬度2 = 72.249672,经度2 = 33.427724 那我就跑了233公里 找到
- haversine函数
此处为:
print haversine(30.512811, 74.166061, 33.427724, 72.249672)
然后我 达到232.55公里
- this online calculator: If I use Latitude1 = 74.166061, Longitude1 = 30.512811, Latitude2 = 72.249672, Longitude2 = 33.427724 then I get 233 km
- haversine function found
here as:
print haversine(30.512811, 74.166061, 33.427724, 72.249672)
then I get 232.55 km
答案应该是233公里,但我的方法是给出约8000公里.我认为尝试在连续的行之间进行迭代是有问题的.
The answer should be 233 km, but my approach is giving ~8000 km. I think there is something wrong with how I am trying to iterate between successive rows.
问题: 熊猫有办法做到这一点吗?还是我需要一次遍历数据帧一行?
Question: Is there a way to do this in Pandas? Or do I need to loop through the dataframe one row at a time?
其他信息:
要创建上述DF,请选择它并复制到剪贴板.然后:
To create the above DF, select it and copy to clipboard. Then:
import pandas as pd
df = pd.read_clipboard()
print df
推荐答案
您可以使用此出色的解决方案(c)@derricw (不要忘记对其进行投票;-):
you can use this great solution (c) @derricw (don't forget to upvote it ;-):
# vectorized haversine function
def haversine(lat1, lon1, lat2, lon2, to_radians=True, earth_radius=6371):
"""
slightly modified version: of http://stackoverflow.com/a/29546836/2901002
Calculate the great circle distance between two points
on the earth (specified in decimal degrees or in radians)
All (lat, lon) coordinates must have numeric dtypes and be of equal length.
"""
if to_radians:
lat1, lon1, lat2, lon2 = np.radians([lat1, lon1, lat2, lon2])
a = np.sin((lat2-lat1)/2.0)**2 + \
np.cos(lat1) * np.cos(lat2) * np.sin((lon2-lon1)/2.0)**2
return earth_radius * 2 * np.arcsin(np.sqrt(a))
df['dist'] = \
haversine(df.LAT.shift(), df.LONG.shift(),
df.loc[1:, 'LAT'], df.loc[1:, 'LONG'])
结果:
In [566]: df
Out[566]:
Ser_Numb LAT LONG dist
0 1 74.166061 30.512811 NaN
1 2 72.249672 33.427724 232.549785
2 3 67.499828 37.937264 554.905446
3 4 84.253715 69.328767 1981.896491
4 5 72.104828 33.823462 1513.397997
5 6 63.989462 51.918173 1164.481327
6 7 80.209112 33.530778 1887.256899
7 8 68.954132 35.981256 1252.531365
8 9 83.378214 40.619652 1606.340727
9 10 68.778571 6.607066 1793.921854
更新:这将有助于理解逻辑:
UPDATE: this will help to understand the logic:
In [573]: pd.concat([df['LAT'].shift(), df.loc[1:, 'LAT']], axis=1, ignore_index=True)
Out[573]:
0 1
0 NaN NaN
1 74.166061 72.249672
2 72.249672 67.499828
3 67.499828 84.253715
4 84.253715 72.104828
5 72.104828 63.989462
6 63.989462 80.209112
7 80.209112 68.954132
8 68.954132 83.378214
9 83.378214 68.778571
这篇关于 pandas 经纬度到连续行之间的距离的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!