如何从历史数据中检索行程? [英] How to retrieve trips from historical data?

查看:101
本文介绍了如何从历史数据中检索行程?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Hive中有以下表 mytable

  id radar_id car_id datetime 
1 A21 123 2017-03-08 17:31:19.0
2 A21 555 2017-03-08 17:32:00.0
3 A21 777 2017-03-08 17 :33:00.0
4 B15 123 2017-03-08 17:35:22.0
5 B15 555 2017-03-08 17:34:05.0
5 B15 777 2017-03-08 20:50:12.0
6 A21 123 2017-03-09 11:00:00.0
7 C11 123 2017-03-09 11:10:00.0
8 A21 123 2017-03- 09 11:12:00.0
9 A21 555 2017-03-09 11:12:10.0
10 B15 123 2017-03-09 11:14:00.0
11 C11 555 2017-03 -09 11:20:00.0

我想要通过雷达 A21 B15 。例如,如果同一个 car_id 的日期不同,那么它的行程就不一样了。基本上,我想考虑同一辆车的雷达 A21 B15 之间的最大时差应该是30分钟。如果它比较大,那么旅程就不一样了,比如对于 car_id 777



我的最终目标是统计每天的平均出行次数(非唯一,所以如果同一辆汽车在同一路线上经过两次,那么它应该计算两次) 。

预期结果如下:

  radar_start radar_end avg_tripscount_per_day 
A21 B15 1.5

日期 2017-03- 08 在雷达 A21 B15 (车 > 777 不会被认为是由于30分钟的限制),而在日期 2017-03-09 时,只有1次旅行。平均每天2 + 1 = 1.5次。



我怎样才能得到这个结果?基本上,我不知道如何在查询中引入30分钟的限制,以及如何通过 radar_start radar_end



谢谢。



更新:


  1. 旅程在开始日期注册。 如果汽车是由雷达 A21 2017-03-08 23:55 和雷达 B15 at 2017-03-09 00:15 ,那么它应该被视为注册日期 2017-03-08

  2. 如果 ids 6和8相同的车 123 A21 传递两次,然后转向 B15 id 10)。应该考虑使用 id 8的最后一次乘车。所以, 8-10 。因此,最接近于 B15 。解释是一辆汽车通过 A21 两次,第二次转向 B15


解决方案

  select count(*)/ count不同的to_date(datetime))作为trips_per_day 

from(选择radar_id
,datetime
,lead(radar_id)作为next_radar_id
,lead(日期时间)over w as next_datetime
$ b $ from mytable

其中radar_id在('A21','B15')

窗口w为

分区由car_id
按日期时间排序

)t

其中radar_id ='A21'
和next_radar_id ='B15'
和日期时间+间隔'30'分钟> = next_datetime
;






  + ---------------- + 
| trips_per_day |
+ ---------------- +
| 1.5 |
+ ---------------- +

Ps

如果您的版本不支持时间间隔,则最后的代码记录可以替换为 -

和to_unix_timestamp(datetime)+ 30 * 60> to_unix_timestamp(next_datetime)


I have the following table mytable in Hive:

id    radar_id     car_id     datetime
1     A21          123        2017-03-08 17:31:19.0
2     A21          555        2017-03-08 17:32:00.0
3     A21          777        2017-03-08 17:33:00.0
4     B15          123        2017-03-08 17:35:22.0
5     B15          555        2017-03-08 17:34:05.0
5     B15          777        2017-03-08 20:50:12.0
6     A21          123        2017-03-09 11:00:00.0
7     C11          123        2017-03-09 11:10:00.0
8     A21          123        2017-03-09 11:12:00.0
9     A21          555        2017-03-09 11:12:10.0
10    B15          123        2017-03-09 11:14:00.0
11    C11          555        2017-03-09 11:20:00.0

I want to get the routes of cars passing through radars A21 and B15 within the same trip. For example, if the date is different for the same car_id, then it is not the same trip. Basically, I want to consider that the maximum time difference between radars A21 and B15 for the same vehicle should be 30 minutes. If it's bigger, then the trip is not the same, like for example for the car_id 777.

My final goal is to count the average number of trips per day (non-unique, so if the same car passed 2 times by the same route, then it should be calculated 2 times).

The expected result is the following one:

radar_start   radar_end       avg_tripscount_per_day
A21           B15             1.5

On the date 2017-03-08 there are 2 trips between radars A21 and B15 (car 777 is not considered due to 30 minutes limit), while on the date 2017-03-09 there is only 1 trip. The average is 2+1=1.5 trips per day.

How can I get this result? Basically, I do not know how to introduce 30 minutes limit in the query and how to group rides by radar_start and radar_end.

Thanks.

Update:

  1. The trip is registered at the date it started.
  2. If the car was triggered by radar A21 at 2017-03-08 23:55 and by radar B15 at 2017-03-09 00:15, then it should be considered as the same trip registered for the date 2017-03-08.
  3. In case of ids 6 and 8 the same car 123 passed by A21 two times, and then it turned to B15 (id 10). The last ride with id 8 should be considered. So, 8-10. Thus, the closest previous to B15. The interpretation is that a car passed by A21 two times and the second time is turned to B15.

解决方案

select  count(*) / count(distinct to_date(datetime))    as trips_per_day

from   (select  radar_id
               ,datetime
               ,lead(radar_id) over w  as next_radar_id
               ,lead(datetime) over w  as next_datetime                    

        from    mytable

        where   radar_id in ('A21','B15')

        window  w as 
                (
                    partition by  car_id
                    order by      datetime
                )
        ) t

where   radar_id        = 'A21'
    and next_radar_id   = 'B15'
    and datetime + interval '30' minutes >= next_datetime
;


+----------------+
| trips_per_day  |
+----------------+
| 1.5            |
+----------------+

P.s.
If your version does not support intervals, the last code record could be replaced by -
and to_unix_timestamp(datetime) + 30*60 > to_unix_timestamp(next_datetime)

这篇关于如何从历史数据中检索行程?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆