时间范围内的最大值(包含重复的日期) [英] Max within a timeframe with date duplicates

查看:177
本文介绍了时间范围内的最大值(包含重复的日期)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一张这样的桌子

DateTime start_time not null,
DateTime end_time not null,
Status_Id int not null,
Entry_Id int not null

我想在一个时间段内获取每种状态的计数,在该时间段内,对于给定的entry_id,只有最后一次启动才有效.

I want to get the count of each status within a time period, where only the last started is valid for a given entry_id.

我现在正在使用的是这个(带有动态日期):

What I am using now is this (with dynamic dates):

with c (Status_Id, Entry_Id, Start_Date) AS (
  select Status_Id, Entry_Id, Start_Date from tbl where
  (End_Date BETWEEN '19000101' AND '21000101')
  AND ((Start_Date BETWEEN '19000101' AND '21000101')
  OR End_Date <= '21000101'))
select Status_Id, count(*) as cnt from 
 (select Entry_Id, max(start_date) as start_date from c
  group by Entry_Id) d inner join
c on c.Entry_Id = d.Entry_Id
and c.start_date = d.start_date
GROUP BY Status_Id WITH ROLLUP

问题在于,当某些entry_id具有多个相同start_date条目时,它的计数就错误了. (在这种情况下,我并不特别在意选择哪个状态,只选择了一个即可)

The problem is that it counts wrong when there are some entry_id that have multiple entries the same start_date. (I don't particularly care which status is chosen in this case, just that only 1 is chosen)

一些测试数据:

status_id   Entry_id    Start_date
496 45173   2010-09-29 18:04:33.000
490 45173   2010-09-29 18:48:20.100
495 45173   2010-09-29 19:25:29.300
489 45174   2010-09-29 18:43:01.500
493 45175   2010-09-29 18:48:00.500
493 45175   2010-09-29 21:16:02.700
489 45175   2010-09-30 17:52:12.100
493 45176   2010-09-29 17:55:21.300
492 45176   2010-09-29 18:20:52.200 <------ This is the one that gives the problems
493 45176   2010-09-29 18:20:52.200 <------ This is the one that gives the problems

结果应该是

495 1
489 2
492 1 (or 493 1)

推荐答案

基于OP的可爱评论的替代答案.

Alternative answer based on OPs lovely comments.

WITH
   [sequenced_data]
AS
(
  SELECT
    *,
    ROW_NUMBER() OVER (PARTITION BY entry_id ORDER BY start_time DESC, status_id DESC) AS [sequence_id]
  FROM
    tbl
  WHERE
    start_time < '21:00' AND end_time > '19:00'
)
SELECT status_id, COUNT(*)
FROM [sequenced_data]
WHERE sequence_id = 1
GROUP BY status_id

仅在没有单个字段可以唯一标识单个记录的情况下才需要ROW_NUMBER()函数.可以在数据中存在唯一标识列的地方编写替代查询.但是,SQL Server在优化如上所述的ROW_NUMBER()查询方面非常有效,并且(假设相关索引)应该有效.

The ROW_NUMBER() function is only needed where there isn't a single field that can uniquely identify individul records. Alternative queries can be written where there is a unique identity column in the data. SQL Server, however, is extremely effective at optimising ROW_NUMBER() queries such as above and it should (assuming relevant indexes) be effective.

编辑

有人刚刚向我建议人们不喜欢长代码,他们更喜欢紧凑代码.因此,CTE版本已被替换为内联版本(出于解释性原因,CTE确实只是帮助分解了查询,并且在需要时包含在编辑历史记录中)...

Someone just suggested to me that people don't like long code, they prefer compact code. So the CTE version has been replaced with an inline version (The CTEs really just helped breakdown the query for explanatory reasons, and is in the edit history if needed)...

编辑

ROW_NUMBER()不能构成WHERE子句的一部分.通过放回一个CTE更新查询.

ROW_NUMBER() can't form part of the WHERE clause, as spotted by OP. Query updated by putting one CTE back in.

这篇关于时间范围内的最大值(包含重复的日期)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆