SQL持续时间计算 [英] SQL duration time calculation

查看:153
本文介绍了SQL持续时间计算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一张给定时间的历史公交车位置表,每秒记录一次.模式如下:

I have a table of historic bus positions at a given time, recorded once per second. The schema looks like this:

BusID        int         not null,
BreadcrumbID int         not null identity (1, 1),
BusStopID    int         null,
Timestamp    datetime    not null

我想根据历史旅行生成公交车站的时间表.如果总线的BusStopID对应于停靠点,则该巴士处于停靠";如果BusStopID为空,则该巴士不处于停靠处".

I want to generate a bus stop schedule based on historic trips. A bus is "at a stop" if its BusStopID corresponds to the stop, and is not "at a stop" if the BusStopID is null.

我需要生成公共汽车在每个站点的平均乘车时间.因此,基本上,我需要执行以下操作:

I need to generate the average times the bus is at each stop. So basically, I need to do the following:

  • 确定公交车站的时间-一个简单的where子句就可以解决问题
  • 确定公共汽车的平均停留时间.出于我的目的,我将离散的停止时间"定义为正负10分钟的窗口;如果公交车在一天的10:04-10:08停靠,另一天的10:06-10:08停靠,第三天的10:14-10:18停靠,则停靠的地点相同,但如果停靠了在10:45-10:48,这将是另一次停止事件.
  • 过滤掉噪音"-即停止只发生过几次但再也没有发生过的时间
  • Identify the times that a bus is at a stop -- a simple where clause does the trick
  • Identify the average times the bus is at a stop. For my purposes, I'm defining a discrete "stop time" as a window of plus/minus 10 minutes; if a bus stops one day from 10:04 - 10:08, another day at 10:06 - 10:08, and a third day at 10:14 - 10:18, those would be the same stop, but if it stops at 10:45 - 10:48, that would be a different stop occurrence.
  • Filter out "noise" -- i.e. stops times that only happened a few times but never again

我完全不知如何完成第二和第三个项目符号.请帮忙!

I'm completely at a loss as to how to accomplish the second and third bullet. Please help!

推荐答案

在很多情况下,我都做了类似的事情.本质上,基于复杂顺序内的分隔进行分组.针对此问题,我使用的方法的基础如下:

On a number of occasions I've done something similar. Essentially, grouping based on separations within a complex ordering. The basics of the approach I use, with regards to this problem, are as follows:

  1. 建立一个表,列出所有感兴趣的时间范围.
  2. 找到每个感兴趣的时间范围组的开始时间.
  3. 找到每个感兴趣的时间范围组的结束时间.
  4. 将开始时间和结束时间加入时间范围列表并进行分组.

或者,更详细地讲:(每个步骤都可能是一个大型CTE的一部分,但为了方便阅读,我将其分解为临时表...)

Or, in more detail: (each of these steps could be a part of one big CTE, but I've broken it down into temp tables for ease of reading...)

第1步:找到所有感兴趣的时间范围的列表(我使用了一种类似于@Brad链接的方法). 注意:正如@Manfred Sorg所指出的那样,这假设总线的数据中没有丢失的秒数".如果时间戳记出现中断,此代码会将单个范围解释为两个(或更多)不同的范围.

Step 1: Find the list of all time ranges of interest (I used a method similar to the one linked to by @Brad). NOTE: as @Manfred Sorg pointed out, this assumes there are no "missing seconds" in a bus's data. If there is a break in the timestamps, this code will interpret the single range as two (or more) distinct ranges.

;with stopSeconds as (
  select BusID, BusStopID, TimeStamp,
         [date] = cast(datediff(dd,0,TimeStamp) as datetime),
         [grp] = dateadd(ss, -row_number() over(partition by BusID order by TimeStamp), TimeStamp)
  from #test
  where BusStopID is not null
)
select BusID, BusStopID, date,
       [sTime] = dateadd(ss,datediff(ss,date,min(TimeStamp)), 0),
       [eTime] = dateadd(ss,datediff(ss,date,max(TimeStamp)), 0),
       [secondsOfStop] = datediff(ss, min(TimeStamp), max(Timestamp)),
       [sOrd] = row_number() over(partition by BusID, BusStopID order by datediff(ss,date,min(TimeStamp))),
       [eOrd] = row_number() over(partition by BusID, BusStopID order by datediff(ss,date,max(TimeStamp)))
into #ranges
from stopSeconds
group by BusID, BusStopID, date, grp

第2步:找到每个站点的最早时间

Step 2: Find the earliest time for each stop

select this.BusID, this.BusStopID, this.sTime minSTime,
       [stopOrder] = row_number() over(partition by this.BusID, this.BusStopID order by this.sTime)
into #starts
from #ranges this
  left join #ranges prev on this.BusID = prev.BusID
                        and this.BusStopID = prev.BusStopID
                        and this.sOrd = prev.sOrd+1
                        and this.sTime between dateadd(mi,-10,prev.sTime) and dateadd(mi,10,prev.sTime)
where prev.BusID is null

第3步:查找各站的最新时间

Step 3: Find the latest time for each stop

select this.BusID, this.BusStopID, this.eTime maxETime,
       [stopOrder] = row_number() over(partition by this.BusID, this.BusStopID order by this.eTime)
into #ends
from #ranges this
  left join #ranges next on this.BusID = next.BusID
                        and this.BusStopID = next.BusStopID
                        and this.eOrd = next.eOrd-1
                        and this.eTime between dateadd(mi,-10,next.eTime) and dateadd(mi,10,next.eTime)
where next.BusID is null

第4步:将所有内容结合在一起

Step 4: Join everything together

select r.BusID, r.BusStopID,
       [avgLengthOfStop] = avg(datediff(ss,r.sTime,r.eTime)),
       [earliestStop] = min(r.sTime),
       [latestDepart] = max(r.eTime)
from #starts s
  join #ends e on s.BusID=e.BusID
              and s.BusStopID=e.BusStopID
              and s.stopOrder=e.stopOrder
  join #ranges r on r.BusID=s.BusID
                and r.BusStopID=s.BusStopID
                and r.sTime between s.minSTime and e.maxETime
                and r.eTime between s.minSTime and e.maxETime
group by r.BusID, r.BusStopID, s.stopOrder
having count(distinct r.date) > 1 --filters out the "noise"

最后,要整理一下,

drop table #ends
drop table #starts
drop table #ranges

这篇关于SQL持续时间计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆