如何从每个组中选择TOP 5%? [英] How do I select TOP 5 PERCENT from each group?

查看:131
本文介绍了如何从每个组中选择TOP 5%?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个这样的示例表:

I have a sample table like this:

CREATE TABLE #TEMP(Category VARCHAR(100), Name VARCHAR(100))

INSERT INTO #TEMP VALUES('A', 'John')
INSERT INTO #TEMP VALUES('A', 'John')
INSERT INTO #TEMP VALUES('A', 'John')
INSERT INTO #TEMP VALUES('A', 'John')
INSERT INTO #TEMP VALUES('A', 'John')
INSERT INTO #TEMP VALUES('A', 'John')
INSERT INTO #TEMP VALUES('A', 'Adam')
INSERT INTO #TEMP VALUES('A', 'Adam')
INSERT INTO #TEMP VALUES('A', 'Adam')
INSERT INTO #TEMP VALUES('A', 'Adam')
INSERT INTO #TEMP VALUES('A', 'Lisa')
INSERT INTO #TEMP VALUES('A', 'Lisa')
INSERT INTO #TEMP VALUES('A', 'Bucky')
INSERT INTO #TEMP VALUES('B', 'Lily')
INSERT INTO #TEMP VALUES('B', 'Lily')
INSERT INTO #TEMP VALUES('B', 'Lily')
INSERT INTO #TEMP VALUES('B', 'Lily')
INSERT INTO #TEMP VALUES('B', 'Lily')
INSERT INTO #TEMP VALUES('B', 'Tom')
INSERT INTO #TEMP VALUES('B', 'Tom')
INSERT INTO #TEMP VALUES('B', 'Tom')
INSERT INTO #TEMP VALUES('B', 'Tom')
INSERT INTO #TEMP VALUES('B', 'Ross')
INSERT INTO #TEMP VALUES('B', 'Ross')
INSERT INTO #TEMP VALUES('B', 'Ross')

SELECT Category, Name, COUNT(Name) Total
FROM #TEMP
GROUP BY Category, Name
ORDER BY Category, Total DESC

DROP TABLE #TEMP

请给我以下内容:

A   John    6
A   Adam    4
A   Lisa    2
A   Bucky   1
B   Lily    5
B   Tom     4
B   Ross    3

现在,我如何从每个类别中选择TOP 5 PERCENT条记录,假设每个类别具有100条以上的记录(此处未在示例表中显示)?例如,在我的实际表中,应该从A中删除John记录,并从B中删除Lily记录(同样,我在这里没有显示完整的表),以获得:

Now, how do I select the TOP 5 PERCENT records from each category assuming each category has more than 100 records (did not show in sample table here)? For instance, in my actual table, it should remove the John record from A and Lily record from B as appropriate (again, I did not show the full table here) to get:

A   Adam    4
A   Lisa    2
A   Bucky   1
B   Tom     4
B   Ross    3

我一直在尝试使用CTE s和PARTITION BY子句,但似乎无法实现我想要的.它从总体结果中删除了前5个百分点,但在每个类别中都没有.有什么建议吗?

I have been trying to use CTEs and PARTITION BY clauses but cannot seem to achieve what I want. It removes the TOP 5 PERCENT from the overall result but not from each category. Any suggestions?

推荐答案

您可以将CTE(公用表表达式)与NTILE窗口函数配对使用-这会将您的数据切成所需的多个切片,例如在您的情况下,切成20片(每片5%).

You could use a CTE (Common Table Expression) paired with the NTILE windowing function - this will slice up your data into as many slices as you need, e.g. in your case, into 20 slices (each 5%).

;WITH SlicedData AS
(
   SELECT Category, Name, COUNT(Name) Total,
            NTILE(20) OVER(PARTITION BY Category ORDER BY COUNT(Name) DESC) AS  'NTile'
   FROM #TEMP
   GROUP BY Category, Name
)
SELECT *
FROM SlicedData
WHERE NTile > 1

这基本上是将数据按Category,Name分组,按其他顺序排序(不确定COUNT(Name)是否真的是您想要的东西),然后将其切成20个片段,每个片段代表您数据分区的5% . NTile = 1的切片是前5%的切片-在从CTE中进行选择时,只需忽略该切片即可.

This basically groups your data by Category,Name, orders by something else (not sure if COUNT(Name) is really the thing you want here), and then slices it up into 20 pieces, each representing 5% of your data partition. The slice with NTile = 1 is the top 5% slice - just ignore that when selecting from the CTE.

请参阅:

  • MSDN docs on NTILE
  • SQL Server 2005 ranking functions
  • SQL SERVER – 2005 – Sample Example of RANKING Functions – ROW_NUMBER, RANK, DENSE_RANK, NTILE

了解更多信息

这篇关于如何从每个组中选择TOP 5%?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆