计算不同记录的窗口函数 [英] Window functions to count distinct records
问题描述
下面的查询基于一个复杂的视图,并且该视图按我的意愿工作(我不打算包含该视图,因为我认为它对解决手头的问题没有帮助).我无法理解的是 drugCountsinFamilies
列.我需要它来显示每个药物系列的 distinct drugName
的数量.您可以从第一个屏幕截图中看到有三个不同的 H3A 行.H3A 的 drugCountsInFamilies
应该是 3(有三种不同的 H3A 药物.)
The query below is based on a complicated view and the view works as I want it to (I'm not going to include the view because I don't think it will help with the question at hand). What I can't get right is the drugCountsinFamilies
column. I need it to show me the number of distinct drugName
s for each drug family. You can see from the first screencap that there are three different H3A rows. The drugCountsInFamilies
for H3A should be 3 (there are three different H3A drugs. )
您可以从第二个屏幕截图中看到,第一个屏幕截图中的 drugCountsInFamilies
正在捕获列出药物名称的行数.
You can see from the second screen cap that what's happening is the drugCountsInFamilies
in the first screen cap is catching the number of rows that the drug name is listed on.
下面是我的问题,对不正确的部分进行评论
Below is my question, with comments on the part that is incorrect
select distinct
rx.patid
,d2.fillDate
,d2.scriptEndDate
,rx.drugName
,rx.drugClass
--the line directly below is the one that I can't figure out why it's wrong
,COUNT(rx.drugClass) over(partition by rx.patid,rx.drugclass,rx.drugname) as drugCountsInFamilies
from
(
select
ROW_NUMBER() over(partition by d.patid order by d.patid,d.uniquedrugsintimeframe desc) as rn
,d.patid
,d.fillDate
,d.scriptEndDate
,d.uniqueDrugsInTimeFrame
from DrugsPerTimeFrame as d
)d2
inner join rx on rx.patid = d2.patid
inner join DrugTable as dt on dt.drugClass=rx.drugClass
where d2.rn=1 and rx.fillDate between d2.fillDate and d2.scriptEndDate
and dt.drugClass in ('h3a','h6h','h4b','h2f','h2s','j7c','h2e')
order by rx.patid
如果我尝试在 count(rx.drugClass)
子句中添加一个不同的内容,则 SSMS 会生气.可以用窗口函数来完成吗?
SSMS gets mad if I try to add a distinct to the count(rx.drugClass)
clause. Can it be done using window functions?
推荐答案
将 count(distinct)
作为 Windows 函数执行需要一个技巧.实际上有几个级别的技巧.
Doing a count(distinct)
as a windows function requires a trick. Several levels of tricks, actually.
因为您的请求实际上非常简单——该值始终为 1,因为 rx.drugClass 在分区子句中——我将做一个假设.假设您想计算每个患者的独特药物类别的数量.
Because your request is actually truly simple -- the value is always 1 because rx.drugClass is in the partitioning clause -- I will make an assumption. Let's say you want to count the number of unique drug classes per patid.
如果是这样,请执行由 patid 和 drugClass 分区的 row_number()
.当此值为 1 时,在 patid 中,则开始一个新的药物类.创建一个标志,在这种情况下为 1,在所有其他情况下为 0.
If so, do a row_number()
partitioned by patid and drugClass. When this is 1, within a patid, , then a new drugClass is starting. Create a flag that is 1 in this case and 0 in all other cases.
然后,您可以简单地使用分区子句执行 sum
以获得不同值的数量.
Then, you can simply do a sum
with a partitioning clause to get the number of distinct values.
查询(格式化后,以便我可以阅读),看起来像:
The query (after formatting it so I can read it), looks like:
select rx.patid, d2.fillDate, d2.scriptEndDate, rx.drugName, rx.drugClass,
SUM(IsFirstRowInGroup) over (partition by rx.patid) as NumDrugCount
from (select distinct rx.patid, d2.fillDate, d2.scriptEndDate, rx.drugName, rx.drugClass,
(case when 1 = ROW_NUMBER() over (partition by rx.drugClass, rx.patid order by (select NULL))
then 1 else 0
end) as IsFirstRowInGroup
from (select ROW_NUMBER() over(partition by d.patid order by d.patid,d.uniquedrugsintimeframe desc) as rn,
d.patid, d.fillDate, d.scriptEndDate, d.uniqueDrugsInTimeFrame
from DrugsPerTimeFrame as d
) d2 inner join
rx
on rx.patid = d2.patid inner join
DrugTable dt
on dt.drugClass = rx.drugClass
where d2.rn=1 and rx.fillDate between d2.fillDate and d2.scriptEndDate and
dt.drugClass in ('h3a','h6h','h4b','h2f','h2s','j7c','h2e')
) t
order by patid
这篇关于计算不同记录的窗口函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!