如何从历史数据中检索行程? [英] How to retrieve trips from historical data?
问题描述
我在Hive中有以下表 mytable
:
id radar_id car_id datetime
1 A21 123 2017-03-08 17:31:19.0
2 A21 555 2017-03-08 17:32:00.0
3 A21 777 2017-03-08 17 :33:00.0
4 B15 123 2017-03-08 17:35:22.0
5 B15 555 2017-03-08 17:34:05.0
5 B15 777 2017-03-08 20:50:12.0
6 A21 123 2017-03-09 11:00:00.0
7 C11 123 2017-03-09 11:10:00.0
8 A21 123 2017-03- 09 11:12:00.0
9 A21 555 2017-03-09 11:12:10.0
10 B15 123 2017-03-09 11:14:00.0
11 C11 555 2017-03 -09 11:20:00.0
我想要通过雷达 A21
和 B15
。例如,如果同一个 car_id
的日期不同,那么它的行程就不一样了。基本上,我想考虑同一辆车的雷达 A21
和 B15
之间的最大时差应该是30分钟。如果它比较大,那么旅程就不一样了,比如对于 car_id
777
。
我的最终目标是统计每天的平均出行次数(非唯一,所以如果同一辆汽车在同一路线上经过两次,那么它应该计算两次) 。
预期结果如下:
radar_start radar_end avg_tripscount_per_day
A21 B15 1.5
日期 2017-03- 08
在雷达 A21
和 B15
(车 > 777
不会被认为是由于30分钟的限制),而在日期 2017-03-09
时,只有1次旅行。平均每天2 + 1 = 1.5次。
我怎样才能得到这个结果?基本上,我不知道如何在查询中引入30分钟的限制,以及如何通过 radar_start
和 radar_end
。
谢谢。
更新:
- 旅程在开始日期注册。 如果汽车是由雷达
A21
在2017-03-08 23:55
和雷达B15
at2017-03-09 00:15
,那么它应该被视为注册日期2017-03-08
。 - 如果
ids
6和8相同的车123
由A21
传递两次,然后转向B15
(id
10)。应该考虑使用id
8的最后一次乘车。所以,8-10
。因此,最接近于B15
。解释是一辆汽车通过A21
两次,第二次转向B15
。
select count(*)/ count不同的to_date(datetime))作为trips_per_day
from(选择radar_id
,datetime
,lead(radar_id)作为next_radar_id
,lead(日期时间)over w as next_datetime
$ b $ from mytable
其中radar_id在('A21','B15')
窗口w为
(
分区由car_id
按日期时间排序
)
)t
其中radar_id ='A21'
和next_radar_id ='B15'
和日期时间+间隔'30'分钟> = next_datetime
;
+ ---------------- +
| trips_per_day |
+ ---------------- +
| 1.5 |
+ ---------------- +
Ps
如果您的版本不支持时间间隔,则最后的代码记录可以替换为 -
和to_unix_timestamp(datetime)+ 30 * 60> to_unix_timestamp(next_datetime)
I have the following table mytable
in Hive:
id radar_id car_id datetime
1 A21 123 2017-03-08 17:31:19.0
2 A21 555 2017-03-08 17:32:00.0
3 A21 777 2017-03-08 17:33:00.0
4 B15 123 2017-03-08 17:35:22.0
5 B15 555 2017-03-08 17:34:05.0
5 B15 777 2017-03-08 20:50:12.0
6 A21 123 2017-03-09 11:00:00.0
7 C11 123 2017-03-09 11:10:00.0
8 A21 123 2017-03-09 11:12:00.0
9 A21 555 2017-03-09 11:12:10.0
10 B15 123 2017-03-09 11:14:00.0
11 C11 555 2017-03-09 11:20:00.0
I want to get the routes of cars passing through radars A21
and B15
within the same trip. For example, if the date is different for the same car_id
, then it is not the same trip. Basically, I want to consider that the maximum time difference between radars A21
and B15
for the same vehicle should be 30 minutes. If it's bigger, then the trip is not the same, like for example for the car_id
777
.
My final goal is to count the average number of trips per day (non-unique, so if the same car passed 2 times by the same route, then it should be calculated 2 times).
The expected result is the following one:
radar_start radar_end avg_tripscount_per_day
A21 B15 1.5
On the date 2017-03-08
there are 2 trips between radars A21
and B15
(car 777
is not considered due to 30 minutes limit), while on the date 2017-03-09
there is only 1 trip. The average is 2+1=1.5 trips per day.
How can I get this result? Basically, I do not know how to introduce 30 minutes limit in the query and how to group rides by radar_start
and radar_end
.
Thanks.
Update:
- The trip is registered at the date it started.
- If the car was triggered by radar
A21
at2017-03-08 23:55
and by radarB15
at2017-03-09 00:15
, then it should be considered as the same trip registered for the date2017-03-08
. - In case of
ids
6 and 8 the same car123
passed byA21
two times, and then it turned toB15
(id
10). The last ride withid
8 should be considered. So,8-10
. Thus, the closest previous toB15
. The interpretation is that a car passed byA21
two times and the second time is turned toB15
.
select count(*) / count(distinct to_date(datetime)) as trips_per_day
from (select radar_id
,datetime
,lead(radar_id) over w as next_radar_id
,lead(datetime) over w as next_datetime
from mytable
where radar_id in ('A21','B15')
window w as
(
partition by car_id
order by datetime
)
) t
where radar_id = 'A21'
and next_radar_id = 'B15'
and datetime + interval '30' minutes >= next_datetime
;
+----------------+
| trips_per_day |
+----------------+
| 1.5 |
+----------------+
P.s.
If your version does not support intervals, the last code record could be replaced by -
and to_unix_timestamp(datetime) + 30*60 > to_unix_timestamp(next_datetime)
这篇关于如何从历史数据中检索行程?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!