ROW_NUMBER()OVER(PARTITION BY ...)使用麻烦 [英] Trouble using ROW_NUMBER() OVER (PARTITION BY ...)
问题描述
我正在使用SQL Server 2008 R2.我有一个名为EmployeeHistory的表,具有以下结构和示例数据:
I'm using SQL Server 2008 R2. I have table called EmployeeHistory with the following structure and sample data:
EmployeeID Date DepartmentID SupervisorID
10001 20130101 001 10009
10001 20130909 001 10019
10001 20131201 002 10018
10001 20140501 002 10017
10001 20141001 001 10015
10001 20141201 001 10014
请注意,随着时间的推移,员工10001一直在更换2个部门和几个主管.我要做的是在日期"字段中按顺序列出每个部门中该雇员的工作的开始和结束日期.因此,输出将如下所示:
Notice that the Employee 10001 has been changing 2 departments and several supervisors over time. What I am trying to do is to list the start and end dates of this employee's employment in each Department ordered by the Date field. So, the output will look like this:
EmployeeID DateStart DateEnd DepartmentID
10001 20130101 20131201 001
10001 20131201 20141001 002
10001 20141001 NULL 001
我打算使用以下查询对数据进行分区,但失败了.部门从001更改为002,然后又更改为001.显然,我无法按DepartmentID进行分区...我敢肯定,我忽略了明显的部分.有什么帮助吗?谢谢你.
I intended to use partitioning the data using the following query but it failed. The Department changes from 001 to 002 and then back to 001. Obviously I cannot partition by DepartmentID... I'm sure I'm overlooking the obvious. Any help? Thank you, in advance.
SELECT * ,ROW_NUMBER() OVER (PARTITION BY EmployeeID, DepartmentID
ORDER BY [Date]) RN FROM EmployeeHistory
推荐答案
涉及到一点点.最简单的方法是参考我为您创建的此SQL提琴,它会产生确切的结果.您可以通过多种方法出于性能或其他方面的考虑对其进行改进,但是希望至少应该比某些替代方案更清楚.
A bit involved. Easiest would be to refer to this SQL Fiddle I created for you that produces the exact result. There are ways you can improve it for performance or other considerations, but this should hopefully at least be clearer than some alternatives.
要点是,您首先要对数据进行规范的排名,然后使用该排名将数据分为几组,然后为每个组找到结束日期,然后消除所有中间行. ROW_NUMBER()和CROSS APPLY在可读性方面有很大帮助.
The gist is, you get a canonical ranking of your data first, then use that to segment the data into groups, then find an end date for each group, then eliminate any intermediate rows. ROW_NUMBER() and CROSS APPLY help a lot in doing it readably.
EDIT 2019:
EDIT 2019:
实际上,由于某种原因,SQL Fiddle确实被破坏了,但是在SQL Fiddle站点上似乎是一个问题.这是一个完整的版本,刚刚在SQL Server 2016上进行了测试:
The SQL Fiddle does in fact seem to be broken, for some reason, but it appears to be a problem on the SQL Fiddle site. Here's a complete version, tested just now on SQL Server 2016:
CREATE TABLE Source
(
EmployeeID int,
DateStarted date,
DepartmentID int
)
INSERT INTO Source
VALUES
(10001,'2013-01-01',001),
(10001,'2013-09-09',001),
(10001,'2013-12-01',002),
(10001,'2014-05-01',002),
(10001,'2014-10-01',001),
(10001,'2014-12-01',001)
SELECT *,
ROW_NUMBER() OVER (PARTITION BY EmployeeID ORDER BY DateStarted) AS EntryRank,
newid() as GroupKey,
CAST(NULL AS date) AS EndDate
INTO #RankedData
FROM Source
;
UPDATE #RankedData
SET GroupKey = beginDate.GroupKey
FROM #RankedData sup
CROSS APPLY
(
SELECT TOP 1 GroupKey
FROM #RankedData sub
WHERE sub.EmployeeID = sup.EmployeeID AND
sub.DepartmentID = sup.DepartmentID AND
NOT EXISTS
(
SELECT *
FROM #RankedData bot
WHERE bot.EmployeeID = sup.EmployeeID AND
bot.EntryRank BETWEEN sub.EntryRank AND sup.EntryRank AND
bot.DepartmentID <> sup.DepartmentID
)
ORDER BY DateStarted ASC
) beginDate (GroupKey);
UPDATE #RankedData
SET EndDate = nextGroup.DateStarted
FROM #RankedData sup
CROSS APPLY
(
SELECT TOP 1 DateStarted
FROM #RankedData sub
WHERE sub.EmployeeID = sup.EmployeeID AND
sub.DepartmentID <> sup.DepartmentID AND
sub.EntryRank > sup.EntryRank
ORDER BY EntryRank ASC
) nextGroup (DateStarted);
SELECT * FROM
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY GroupKey ORDER BY EntryRank ASC) AS GroupRank FROM #RankedData
) FinalRanking
WHERE GroupRank = 1
ORDER BY EntryRank;
DROP TABLE #RankedData
DROP TABLE Source
这篇关于ROW_NUMBER()OVER(PARTITION BY ...)使用麻烦的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!