如何在SQL Server中使用带有框架的窗口函数进行COUNT(DISTINCT)操作 [英] How to do a COUNT(DISTINCT) using window functions with a frame in SQL Server

查看:483
本文介绍了如何在SQL Server中使用带有框架的窗口函数进行COUNT(DISTINCT)操作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Pi带这个可爱的问题:
分区函数COUNT()OVER可能使用DISTINCT

Piggybacking this lovely question: Partition Function COUNT() OVER possible using DISTINCT

我希望计算不同值的移动计数。
类似于以下内容:

I wish to calculate a moving count of distinct value. Something along the lines of:

Count(distinct machine_id)over(按模型顺序按前6行和当前行之间的_timestamp行划分)

很显然,SQL Server不支持该语法。不幸的是,我不太了解(没有内部化会更准确)密集_排名漫游的工作方式:

Obviously, SQL Server does not support the syntax. Unfortunately, I don't understand well enough (didn't not internalize would be more accurate) how that dense_rank walk-around works:

dense_rank() over (partition by model order by machine_id) 
+ dense_rank() over (partition by model order by machine_id) 
- 1

,因此我无法对其进行调整以满足我对移动窗口的需求。
如果我通过machine_id进行订购,是否也可以通过_timestamp进行订购并在之间使用行$

and therefore I am not able tweak it to meet my need for a moving window. If I order by machine_id, would it be enough to order by _timestamp as well and use rows between?

推荐答案

dense_rank()给出当前记录的密集排名。首先使用 ASC 排序顺序运行该记录时,您将从第一个元素获取当前记录的密集排名(唯一值排名)。当您以 DESC 的顺序运行时,您将从最近的记录中获得当前记录的密集排名。然后删除1,因为当前记录的密集排名被计算两次。这样就给出了整个分区中的唯一值总数(并在每一行重复)。

dense_rank() gives the dense ranking of the the current record. When you run that with ASC sort order first, you get the current record's dense rank (unique value rank) from the first element. When you run with DESC order, then you get the current record's dense rank from the last record. Then you remove 1 because the dense ranking of the current record is counted twice. This gives the total unique values in the whole partition (and repeated for every row).

因为, dense_rank 不支持框架,您不能直接使用此解决方案。您需要通过其他方式生成框架。一种方法可能是 JOIN 使用正确的唯一ID 比较相同的表。然后,您可以在组合版本上使用 dense_rank

Since, dense_rank does not support frames, you can't use this solution directly. You need to generate the frame by other means. One way could be JOINing the same table with proper unique id comparisons. Then, you can use dense_rank on the combined version.

请查看以下解决方案建议。假设您的表中有一个唯一的记录键( record_id )。如果您没有唯一键,请在第一个CTE之前添加另一个CTE,并为每个记录生成一个唯一键(使用 new_id()函数,或使用 concat(),中间用定界符表示 NULLs

Please check out the following solution proposal. The assumption there is you have a unique record key (record_id) available in your table. If you don't have a unique key, add another CTE before the first CTE and generate a unique key for each record (using new_id() function OR combining multiple columns using concat() with delimiter in between to account for NULLs)

; WITH cte AS (
SELECT 
  record_id
  , record_id_6_record_earlier = LEAD(machine_id, 6, NULL) OVER (PARTITION BY model ORDER BY _timestamp)
  , .... other columns
FROM mainTable
)
, cte2 AS (
SELECT 
  c.*
  , DistinctCntWithin6PriorRec = dense_rank() OVER (PARTITION BY c.model, c.record_id ORDER BY t._timestamp)
    + dense_rank() OVER (PARTITION BY c.model, c.record_id ORDER BY t._timestamp DESC)
    - 1
  , RN = ROW_NUMBER() OVER (PARTITION BY c.record_id ORDER BY t._timestamp )
FROM cte c
     LEFT JOIN mainTable t ON t.record_id BETWEEN c.record_id_6_record_earlier  and c.record_id
)
SELECT *
FROM cte2
WHERE RN = 1

此解决方案有2个局限性:

There are 2 LIMITATIONS of this solution:


  1. 如果框架的记录少于6条,则 LAG()函数将为 NULL ,因此此解决方案将不起作用。这可以用不同的方式处理:我可以想到的一种快速方法是生成6条LEAD列(先记录1条记录,先记录2条记录,等等),然后更改 BETWEEN 子句,例如 BETAEENCOALESCE(c.record_id_6_record_earlier,c.record_id_5_record_earlier,....,c.record_id_1_record_earlier,c.record_id)和c.record_id

  1. If the frame has less than 6 records, then the LAG() function will be NULL and thus this solution will not work. This can be handled in different ways: One quick way I can think of is to generate 6 LEAD columns (1 record prior, 2 records prior, etc.) and then change the BETWEEN clause to something like this BETWEEN COALESCE(c.record_id_6_record_earlier, c.record_id_5_record_earlier, ...., c.record_id_1_record_earlier, c.record_id) and c.record_id

COUNT()不计算 NULL 。但是 DENSE_RANK 可以。如果它也适用于您的数据,那么您也需要考虑这一点

COUNT() does not count NULL. But DENSE_RANK does. You need account for that too if it applies to your data

这篇关于如何在SQL Server中使用带有框架的窗口函数进行COUNT(DISTINCT)操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆