SQL持续时间计算 [英] SQL duration time calculation
问题描述
我有一张给定时间的历史公交车位置表,每秒记录一次.模式如下:
I have a table of historic bus positions at a given time, recorded once per second. The schema looks like this:
BusID int not null,
BreadcrumbID int not null identity (1, 1),
BusStopID int null,
Timestamp datetime not null
我想根据历史旅行生成公交车站的时间表.如果总线的BusStopID
对应于停靠点,则该巴士处于停靠";如果BusStopID
为空,则该巴士不处于停靠处".
I want to generate a bus stop schedule based on historic trips. A bus is "at a stop" if its BusStopID
corresponds to the stop, and is not "at a stop" if the BusStopID
is null.
我需要生成公共汽车在每个站点的平均乘车时间.因此,基本上,我需要执行以下操作:
I need to generate the average times the bus is at each stop. So basically, I need to do the following:
- 确定公交车站的时间-一个简单的
where
子句就可以解决问题 - 确定公共汽车的平均停留时间.出于我的目的,我将离散的停止时间"定义为正负10分钟的窗口;如果公交车在一天的10:04-10:08停靠,另一天的10:06-10:08停靠,第三天的10:14-10:18停靠,则停靠的地点相同,但如果停靠了在10:45-10:48,这将是另一次停止事件.
- 过滤掉噪音"-即停止只发生过几次但再也没有发生过的时间
- Identify the times that a bus is at a stop -- a simple
where
clause does the trick - Identify the average times the bus is at a stop. For my purposes, I'm defining a discrete "stop time" as a window of plus/minus 10 minutes; if a bus stops one day from 10:04 - 10:08, another day at 10:06 - 10:08, and a third day at 10:14 - 10:18, those would be the same stop, but if it stops at 10:45 - 10:48, that would be a different stop occurrence.
- Filter out "noise" -- i.e. stops times that only happened a few times but never again
我完全不知如何完成第二和第三个项目符号.请帮忙!
I'm completely at a loss as to how to accomplish the second and third bullet. Please help!
推荐答案
在很多情况下,我都做了类似的事情.本质上,基于复杂顺序内的分隔进行分组.针对此问题,我使用的方法的基础如下:
On a number of occasions I've done something similar. Essentially, grouping based on separations within a complex ordering. The basics of the approach I use, with regards to this problem, are as follows:
- 建立一个表,列出所有感兴趣的时间范围.
- 找到每个感兴趣的时间范围组的开始时间.
- 找到每个感兴趣的时间范围组的结束时间.
- 将开始时间和结束时间加入时间范围列表并进行分组.
或者,更详细地讲:(每个步骤都可能是一个大型CTE的一部分,但为了方便阅读,我将其分解为临时表...)
Or, in more detail: (each of these steps could be a part of one big CTE, but I've broken it down into temp tables for ease of reading...)
第1步:找到所有感兴趣的时间范围的列表(我使用了一种类似于@Brad链接的方法). 注意:正如@Manfred Sorg所指出的那样,这假设总线的数据中没有丢失的秒数".如果时间戳记出现中断,此代码会将单个范围解释为两个(或更多)不同的范围.
Step 1: Find the list of all time ranges of interest (I used a method similar to the one linked to by @Brad). NOTE: as @Manfred Sorg pointed out, this assumes there are no "missing seconds" in a bus's data. If there is a break in the timestamps, this code will interpret the single range as two (or more) distinct ranges.
;with stopSeconds as (
select BusID, BusStopID, TimeStamp,
[date] = cast(datediff(dd,0,TimeStamp) as datetime),
[grp] = dateadd(ss, -row_number() over(partition by BusID order by TimeStamp), TimeStamp)
from #test
where BusStopID is not null
)
select BusID, BusStopID, date,
[sTime] = dateadd(ss,datediff(ss,date,min(TimeStamp)), 0),
[eTime] = dateadd(ss,datediff(ss,date,max(TimeStamp)), 0),
[secondsOfStop] = datediff(ss, min(TimeStamp), max(Timestamp)),
[sOrd] = row_number() over(partition by BusID, BusStopID order by datediff(ss,date,min(TimeStamp))),
[eOrd] = row_number() over(partition by BusID, BusStopID order by datediff(ss,date,max(TimeStamp)))
into #ranges
from stopSeconds
group by BusID, BusStopID, date, grp
第2步:找到每个站点的最早时间
Step 2: Find the earliest time for each stop
select this.BusID, this.BusStopID, this.sTime minSTime,
[stopOrder] = row_number() over(partition by this.BusID, this.BusStopID order by this.sTime)
into #starts
from #ranges this
left join #ranges prev on this.BusID = prev.BusID
and this.BusStopID = prev.BusStopID
and this.sOrd = prev.sOrd+1
and this.sTime between dateadd(mi,-10,prev.sTime) and dateadd(mi,10,prev.sTime)
where prev.BusID is null
第3步:查找各站的最新时间
Step 3: Find the latest time for each stop
select this.BusID, this.BusStopID, this.eTime maxETime,
[stopOrder] = row_number() over(partition by this.BusID, this.BusStopID order by this.eTime)
into #ends
from #ranges this
left join #ranges next on this.BusID = next.BusID
and this.BusStopID = next.BusStopID
and this.eOrd = next.eOrd-1
and this.eTime between dateadd(mi,-10,next.eTime) and dateadd(mi,10,next.eTime)
where next.BusID is null
第4步:将所有内容结合在一起
Step 4: Join everything together
select r.BusID, r.BusStopID,
[avgLengthOfStop] = avg(datediff(ss,r.sTime,r.eTime)),
[earliestStop] = min(r.sTime),
[latestDepart] = max(r.eTime)
from #starts s
join #ends e on s.BusID=e.BusID
and s.BusStopID=e.BusStopID
and s.stopOrder=e.stopOrder
join #ranges r on r.BusID=s.BusID
and r.BusStopID=s.BusStopID
and r.sTime between s.minSTime and e.maxETime
and r.eTime between s.minSTime and e.maxETime
group by r.BusID, r.BusStopID, s.stopOrder
having count(distinct r.date) > 1 --filters out the "noise"
最后,要整理一下,
drop table #ends
drop table #starts
drop table #ranges
这篇关于SQL持续时间计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!