计算时间坐标之间的距离和速度 [英] Calculating distance and velocity between time ordered coordinates
问题描述
我有一个csv,其中包含在给定时间(timestamp
)上由id
字段表示的给定用户的位置(latitude
,longitude
).我需要为每个用户计算点与连续点之间的距离和速度.例如,对于ID 1,我需要找到点1和点2,点2和点3,点3和点4之间的距离和速度,依此类推.鉴于我正在使用地球上的坐标,因此我知道Haversine度量标准将用于距离计算,但是,鉴于时间和用户订单方面的问题,我不确定如何遍历我的文件.有了python
,我该如何遍历文件来按用户和时间对事件进行排序,然后计算每个事件之间的距离和速度?
I have a csv containing locations (latitude
,longitude
) for a given user denoted by the id
field, at a given time (timestamp
). I need to calculate the distance and the velocity between a point and the successive point for each user. For example, for ID 1 I need to find the distance and velocity between point 1 and point 2, point 2 and point 3, point 3 and point 4, and so on. Given I am working with coordinates on the Earth, I understand the Haversine metric will be used for distance calculations, however, I am unsure how to iterate though my file given the time and user order aspect to my problem. Given this, with python
, how can I iterate through my file to sort the events by user and by time, and then calculate the distance and velocity between each?
理想情况下,输出将是第二个csv,如下所示:ID#, start_time, start_location, end_time, end_location, distance, velocity
.
Ideally, the output would be a second csv looking something like: ID#, start_time, start_location, end_time, end_location, distance, velocity
.
以下示例数据:
ID,timestamp,latitude,longitude
3,6/9/2017 22:20,38.7953326,77.0088833
1,5/5/2017 13:10,38.8890106,77.0500613
2,2/10/2017 16:23,40.7482494,73.9841913
1,5/5/2017 12:35,38.9206015,77.2223287
3,6/10/2017 10:00,42.3662109,71.0209426
1,5/5/2017 20:00,38.8974155,77.0368333
2,2/10/2017 7:30,38.8514261,77.0422981
3,6/9/2017 10:20,38.9173461,77.2225527
2,2/10/2017 19:51,40.7828687,73.9675438
3,6/10/2017 6:42,38.9542676,77.4496951
1,5/5/2017 16:35,38.8728748,77.0077629
2,2/10/2017 10:00,40.7769311,73.8761546
推荐答案
似乎您可以使用 pandas
的魔力.
Seems like you could use the magic of pandas
.
Based on your sample data, this will create the following dataframe
:
ID timestamp latitude longitude
0 3 6/9/2017 22:20 38.795333 77.008883
1 1 5/5/2017 13:10 38.889011 77.050061
2 2 2/10/2017 16:23 40.748249 73.984191
3 1 5/5/2017 12:35 38.920602 77.222329
4 3 6/10/2017 10:00 42.366211 71.020943
5 1 5/5/2017 20:00 38.897416 77.036833
6 2 2/10/2017 7:30 38.851426 77.042298
7 3 6/9/2017 10:20 38.917346 77.222553
8 2 2/10/2017 19:51 40.782869 73.967544
9 3 6/10/2017 6:42 38.954268 77.449695
10 1 5/5/2017 16:35 38.872875 77.007763
11 2 2/10/2017 10:00 40.776931 73.876155
转换时间戳列
Pandas(通常是python)具有大量用于日期和时间操作的库.但是首先,您需要通过将timestamp列(字符串)转换为datetime对象来准备数据.我假设您的数据采用格式 "MM/DD/YYYY"
(因为您未指定).
Convert the timestamp column
Pandas (and python in general) has extensive libraries for date and time operations. But first, you will need to prepare your data by converting the timestamp column (a string) into a datetime object. I am assuming your data is in the format "MM/DD/YYYY"
(since you didn't specify).
df['timestamp'] = pd.to_datetime(df['timestamp'], format='%m/%d/%Y %H:%M')
辅助功能
您将必须定义一些函数来计算距离和速度. Haversine距离功能改编自此答案.
from math import sin, cos, sqrt, atan2, radians
def getDistanceFromLatLonInKm(lat1,lon1,lat2,lon2):
R = 6371 # Radius of the earth in km
dLat = radians(lat2-lat1)
dLon = radians(lon2-lon1)
rLat1 = radians(lat1)
rLat2 = radians(lat2)
a = sin(dLat/2) * sin(dLat/2) + cos(rLat1) * cos(rLat2) * sin(dLon/2) * sin(dLon/2)
c = 2 * atan2(sqrt(a), sqrt(1-a))
d = R * c # Distance in km
return d
def calc_velocity(dist_km, time_start, time_end):
"""Return 0 if time_start == time_end, avoid dividing by 0"""
return dist_km / (time_end - time_start).seconds if time_end > time_start else 0
设置一些中间变量
我们想在每一行上计算Haversine函数,但是我们需要从第一行开始为每组提供一些信息.幸运的是,pandas
通过 sort_values()
, groupby()
和 transform()
.
Make some intermediate variables
We want to compute the Haversine function on each row, but we need some information from the first row for each group. Luckily, pandas
makes this easy with sort_values()
, groupby()
and transform()
.
以下代码创建了3个新列,每个列分别用于每个ID的初始纬度,经度和时间.
The following code makes 3 new columns, one each for the initial latitude, longitude, and time for each ID.
# First sort by ID and timestamp:
df = df.sort_values(by=['ID', 'timestamp'])
# Group the sorted dataframe by ID, and grab the initial value for lat, lon, and time.
df['lat0'] = df.groupby('ID')['latitude'].transform(lambda x: x.iat[0])
df['lon0'] = df.groupby('ID')['longitude'].transform(lambda x: x.iat[0])
df['t0'] = df.groupby('ID')['timestamp'].transform(lambda x: x.iat[0])
应用功能
# create a new column for distance
df['dist_km'] = df.apply(
lambda row: getDistanceFromLatLonInKm(
lat1=row['latitude'],
lon1=row['longitude'],
lat2=row['lat0'],
lon2=row['lon0']
),
axis=1
)
# create a new column for velocity
df['velocity_kmps'] = df.apply(
lambda row: calc_velocity(
dist_km=row['dist_km'],
time_start=row['t0'],
time_end=row['timestamp']
),
axis=1
)
结果
>>> print(df[['ID', 'timestamp', 'latitude', 'longitude', 'dist_km', 'velocity_kmps']])
ID timestamp latitude longitude dist_km velocity_kmps
3 1 2017-05-05 12:35:00 38.920602 77.222329 0.000000 0.000000
1 1 2017-05-05 13:10:00 38.889011 77.050061 15.314742 0.007293
10 1 2017-05-05 16:35:00 38.872875 77.007763 19.312148 0.001341
5 1 2017-05-05 20:00:00 38.897416 77.036833 16.255868 0.000609
6 2 2017-02-10 07:30:00 38.851426 77.042298 0.000000 0.000000
11 2 2017-02-10 10:00:00 40.776931 73.876155 344.880549 0.038320
2 2 2017-02-10 16:23:00 40.748249 73.984191 335.727502 0.010498
8 2 2017-02-10 19:51:00 40.782869 73.967544 339.206320 0.007629
7 3 2017-06-09 10:20:00 38.917346 77.222553 0.000000 0.000000
0 3 2017-06-09 22:20:00 38.795333 77.008883 22.942974 0.000531
9 3 2017-06-10 06:42:00 38.954268 77.449695 20.070609 0.000274
4 3 2017-06-10 10:00:00 42.366211 71.020943 648.450485 0.007611
在这里,我将留给您了解如何获取每个ID的最后一个条目.
From here, I will leave it to you to figure out how to grab the last entry for each ID.
这篇关于计算时间坐标之间的距离和速度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!