从表中按日期查找SQLServer的中值 [英] Finding the Median value from a table, Group By Date SQLServer
问题描述
我要解决一个复杂的问题.请忍受我,随时问任何问题.我对SQL还是很陌生,对此有困难...
I have a complicated problem I am trying to solve. Please bear with me and feel free to ask any questions. I am quite new to SQL and having difficulty with this...
我需要计算一组值的中位数.现在,这些值未在表中给出.这些值是基于按日期分组的每小时发生次数从表中得出的.
I need to count the median of a group of values. Now the values are not given in a table. The values are derived from a table based on hourly occurrences grouped by date.
这是从中汇总数据的示例表.
Here's the sample table from where data is pooled.
CREATE TABLE Table22(
Request_Number BIGINT NOT NULL
,Request_Received_Date DATETIME NOT NULL
);
INSERT INTO Table22(Request_Number,Request_Received_Date) VALUES (2016311446,'8/9/16 9:56');
INSERT INTO Table22(Request_Number,Request_Received_Date) VALUES (20163612157,'9/6/16 9:17');
INSERT INTO Table22(Request_Number,Request_Received_Date) VALUES (2016384250,'9/12/16 14:52');
INSERT INTO Table22(Request_Number,Request_Received_Date) VALUES (20162920101,'4/19/16 8:11');
INSERT INTO Table22(Request_Number,Request_Received_Date) VALUES (2016418170,'10/6/16 12:28');
INSERT INTO Table22(Request_Number,Request_Received_Date) VALUES (2016392953,'9/6/16 12:39');
INSERT INTO Table22(Request_Number,Request_Received_Date) VALUES (20164123416,'10/6/16 15:05');
INSERT INTO Table22(Request_Number,Request_Received_Date) VALUES (2016335972,'8/9/16 7:49');
INSERT INTO Table22(Request_Number,Request_Received_Date) VALUES (20162622951,'9/6/16 9:57');
INSERT INTO Table22(Request_Number,Request_Received_Date) VALUES (20163913504,'9/6/16 9:47');
INSERT INTO Table22(Request_Number,Request_Received_Date) VALUES (20163211326,'9/6/16 12:38');
INSERT INTO Table22(Request_Number,Request_Received_Date) VALUES (20163610132,'8/30/16 16:34');
INSERT INTO Table22(Request_Number,Request_Received_Date) VALUES (20164119560,'10/6/16 15:53');
INSERT INTO Table22(Request_Number,Request_Received_Date) VALUES (2016334416,'8/10/16 11:06');
INSERT INTO Table22(Request_Number,Request_Received_Date) VALUES (20164320028,'10/6/16 15:27');
INSERT INTO Table22(Request_Number,Request_Received_Date) VALUES (20163515193,'8/24/16 19:50');
INSERT INTO Table22(Request_Number,Request_Received_Date) VALUES (2016159834,'4/19/16 13:21');
INSERT INTO Table22(Request_Number,Request_Received_Date) VALUES (2016178443,'4/19/16 13:05');
该表有2列:Request_Number和Request_Received_Date.Request_Number不是唯一的,并且是无关紧要的.我正在寻找在特定日期和该日期(24小时)内每小时收到多少个请求.每次有一个日期条目,即被计为一次事件(TicketCount).我可以使用COUNT语句对Request_received_date中的*进行计数,并按日期和小时进行分组.
The Table has 2 columns: Request_Number and Request_Received_Date. Request_Number is not unique and is kind of irrelevant. I am looking for how many requests are received for a particular date and hourly within that date (24 hours). Every time there is an entry for a date, that is counted as one occurrence (TicketCount). I can use the COUNT statements to count * from Request_received_date and group by date and hour.
我就是这样做的,并在脚本中创建了一个临时表:
I did just that and created a temporary table within my script:
CREATE TABLE #z (ForDate date, OnHour int, TicketCount int)
INSERT INTO #z (ForDate, OnHour, TicketCount)
SELECT CAST(Request_received_date as DATE) AS 'ForDate',
DATEPART(hh, request_received_date) AS 'OnHour',
COUNT(*) AS TicketCount /*Hourly Ticket Count Column*/
FROM Table22
GROUP BY CAST(request_received_date as DATE), DATEPART(hh, request_received_date)
ORDER BY ForDate Desc, OnHour ASC
SELECT * FROM #z order by ForDate Desc, OnHour ASC
现在,我很难找到每天计数的中位数.我尝试了许多不同的公式来进行中位数计算,并且能够使大多数公式起作用.可在此处找到许多不同的中值计算示例 https://sqlperformance.com/2012/08/t-sql-queries/中位数
Now I am having the hardest time finding the median value of count per day. I have tried many different formula for median calculation and was able to make most them work. Many different examples of median calculation can be found here https://sqlperformance.com/2012/08/t-sql-queries/median
我喜欢这段脚本来找到中位数.查找中位数的脚本很简单.但是它找到Request_Received_Date的所有值的中位数.我无法在此处找到使用group by date子句的方法.
I like this piece of script to find median. The script for finding median is simple. But it finds median for all the values of Request_Received_Date. I am unable to find a way to use the group by date clause in here.
DECLARE @Median DECIMAL (12,2);
SELECT @Median = (
(SELECT MAX(TicketCount) FROM
(SELECT TOP 50 PERCENT TicketCount FROM #z ORDER BY TicketCount) AS BottomHalf)
+
(SELECT MIN(TicketCount) FROM
(SELECT TOP 50 PERCENT TicketCount FROM #z ORDER BY TicketCount DESC) AS TopHalf))/2;
SELECT @Median
任何帮助将不胜感激.
预期结果是这样的:
ForDate Median
10/6/2016 2
9/12/2016 1
9/6/2016 2.5
8/30/2016 1
8/24/2016 1
8/10/2016 1
8/9/2016 1
4/19/2016 1.5
推荐答案
这样的事情怎么样?(仅在使用SQL Server 2012或更高版本时适用)
How about something like this? (Only apply if you use SQL Server 2012 or above)
SELECT DISTINCT ForDate, PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY TicketCount) OVER (PARTITION BY ForDate) AS Median
FROM #z;
简而言之,SQL Server有两种计算中位数的方法,您可以在此处阅读: https://msdn.microsoft.com/en-us/library/hh231327.aspx
在这种情况下,您可以将它们与此处的代码进行比较:
You can compare them both in this case with the code here:
SELECT DISTINCT
ForDate
, PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY TicketCount) OVER (PARTITION BY ForDate) AS MedianDisc
, PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY TicketCount) OVER (PARTITION BY ForDate) AS MedianCont
FROM
#z;
这篇关于从表中按日期查找SQLServer的中值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!