Python函数使用 pandas 中的Haversine公式计算距离 [英] Python function to calculate distance using haversine formula in pandas
问题描述
(IPython笔记本) (公交车统计)
(IPython notebook) (Bus statistics)
summary.head()
summary.head()
我需要计算每两行之间的distance_travelled,其中 1)row ['sequence']!= 0,因为公交车在其最初的停留处没有距离2)row ['track_id'] == previous_row ['track_id'].
I need to calculate distance_travelled between each two rows, where 1) row['sequence'] != 0, since there is no distance when the bus is at his initial stop 2) row['track_id'] == previous_row['track_id'].
我定义了hasrsine公式:
I have haversine formula defined:
def haversine(lon1, lat1, lon2, lat2):
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
# haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * asin(sqrt(a))
r = 6371 # Radius of earth in kilometers. Use 3956 for miles
return c * r
我不确定如何执行此操作.如果行'sequence'参数不为0并且行的'track_id'与上一行的'track_id'相等,则其中一种想法是使用itterrows()并应用harvesine()函数.
I am not exactly sure how to go about this. One of the ideas is use itterrows() and apply harvesine() function, if rows 'sequence' parameter is not 0 and row's 'track_id' is equal to previous row's 'track_id'
我认为没有必要检查行和上一行的'track_id'是否相同,因为haversine()函数仅适用于两行,并且当sequence = 0时,该行的距离= = 0,表示track_id已更改.因此,基本上,将Haversine()函数应用于序列"!= 0的所有行,即Haversine(previous_row.lng,previous_row.lat,current_row.lng,current_row.lat).仍然需要帮助
I figured there is no need to check if 'track_id' of row and previous row is the same, since the haversine() function is applied to two rows only, and when sequence = 0, that row's distance == 0, which means that the track_id has changed. So, basically, apply haversine() function to all rows whose 'sequence' != 0, ie haversine(previous_row.lng, previous_row.lat, current_row.lng, current_row.lat). Still need help with that though
我设法达到以下目标:
I managed to achieve something similar with:
summary['distance_travelled'] = summary.apply(lambda row: haversine(row['lng'], row['lat'], previous_row['lng'], previous_row['lat']), axis=1)
其中previous_row实际上应该是previous_row,因为现在它只是一个占位符字符串,什么都不做.
where previous_row should actually be previous_row, since now it is only a placeholder string, which does nothing.
推荐答案
IIUC,您可以尝试:
IIUC you can try:
print summary
track_id sequence lat lng distance_travelled
0 1-1 0 41.041870 29.060010 0
4 1-1 1 41.040859 29.059980 0
6 1-1 2 41.039242 29.059731 0
#create new shifted columns
summary['latp'] = summary['lat'].shift(1)
summary['lngp'] = summary['lng'].shift(1)
print summary
track_id sequence lat lng distance_travelled latp \
0 1-1 0 41.041870 29.060010 0 NaN
4 1-1 1 41.040859 29.059980 0 41.041870
6 1-1 2 41.039242 29.059731 0 41.040859
lngp
0 NaN
4 29.06001
6 29.05998
summary['distance_travelled'] = summary.apply(lambda row: haversine(row['lng'], row['lat'], row['lngp'], row['latp']), axis=1)
#remove column lngp, latp
summary = summary.drop(['lngp','latp'], axis=1)
print summary
track_id sequence lat lng distance_travelled
0 1-1 0 41.041870 29.060010 NaN
4 1-1 1 41.040859 29.059980 0.112446
6 1-1 2 41.039242 29.059731 0.181011
这篇关于Python函数使用 pandas 中的Haversine公式计算距离的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!