获得每个组的前1行 [英] Get top 1 row of each group
问题描述
我有一张表格,我想为每个组获得最新的条目。下面是表格:
DocumentStatusLogs
表格
| ID | DocumentID |状态| DateCreated |
| 2 | 1 | S1 | 7/29/2011 |
| 3 | 1 | S2 | 7/30/2011 |
| 6 | 1 | S1 | 8/02/2011 |
| 1 | 2 | S1 | 7/28/2011 |
| 4 | 2 | S2 | 7/30/2011 |
| 5 | 2 | S3 | 8/01/2011 |
| 6 | 3 | S1 | 8/02/2011 |
表格将按 DocumentID
分组,按 DateCreated
降序排序。对于每个 DocumentID
,我想获得最新状态。
我的首选输出:
| DocumentID |状态| DateCreated |
| 1 | S1 | 8/02/2011 |
| 2 | S3 | 8/01/2011 |
| 3 | S1 | 8/02/2011 |
-
是否有任何聚合函数仅获取顶部每组?请参阅下面的伪代码
GetOnlyTheTop
:
SELECT
DocumentID ,
GetOnlyTheTop(状态),
GetOnlyTheTop(DateCreated)
FROM DocumentStatusLogs
GROUP BY文件ID
ORDER BY DateCreated DESC
b $ b - 首先,这可能是由非规范化数据库引起的吗?我在想,因为我要找的只是一行,那么
status
也应该位于父表中吗?
请参阅父表获取更多信息:
当前文档
表
| DocumentID |标题|内容| DateCreated |
| 1 | TitleA | ... | ... |
| 2 | TitleB | ... | ... |
| 3 | TitleC | ... | ... |
父表应该如此以便我可以轻松访问它的状态吗?
| DocumentID |标题|内容| DateCreated | CurrentStatus |
| 1 | TitleA | ... | ... | s1 |
| 2 | TitleB | ... | ... | s3 |
| 3 | TitleC | ... | ... | s1 |
更新
我刚学会如何使用apply这使得解决这些问题变得更容易。
; WITH cte AS
SELECT *,
ROW_NUMBER()OVER(PARTITION BY DocumentID ORDER BY DateCreated DESC)AS
FROM DocumentStatusLogs
)
SELECT *
FROM cte
WHERE rn = 1
如果您希望每天有2个条目,那么这将是任意的选一个。要获得一天的两个条目,请使用DENSE_RANK而不是
至于规范化与否,取决于您是否想要:
现在,您可以保留状态历史记录。如果你想在父表中获得最新的状态(这是非规范化),你需要一个触发器来维护父状态。或者放弃这个状态历史记录表。
I have a table which I want to get the latest entry for each group. Here's the table:
DocumentStatusLogs
Table
|ID| DocumentID | Status | DateCreated |
| 2| 1 | S1 | 7/29/2011 |
| 3| 1 | S2 | 7/30/2011 |
| 6| 1 | S1 | 8/02/2011 |
| 1| 2 | S1 | 7/28/2011 |
| 4| 2 | S2 | 7/30/2011 |
| 5| 2 | S3 | 8/01/2011 |
| 6| 3 | S1 | 8/02/2011 |
The table will be grouped by DocumentID
and sorted by DateCreated
in descending order. For each DocumentID
, I want to get the latest status.
My preferred output:
| DocumentID | Status | DateCreated |
| 1 | S1 | 8/02/2011 |
| 2 | S3 | 8/01/2011 |
| 3 | S1 | 8/02/2011 |
Is there any aggregate function to get only the top from each group? See pseudo-code
GetOnlyTheTop
below:SELECT DocumentID, GetOnlyTheTop(Status), GetOnlyTheTop(DateCreated) FROM DocumentStatusLogs GROUP BY DocumentID ORDER BY DateCreated DESC
If such function doesn't exist, is there any way I can achieve the output I want?
- Or at the first place, could this be caused by unnormalized database? I'm thinking, since what I'm looking for is just one row, should that
status
also be located in the parent table?
Please see the parent table for more information:
Current Documents
Table
| DocumentID | Title | Content | DateCreated |
| 1 | TitleA | ... | ... |
| 2 | TitleB | ... | ... |
| 3 | TitleC | ... | ... |
Should the parent table be like this so that I can easily access its status?
| DocumentID | Title | Content | DateCreated | CurrentStatus |
| 1 | TitleA | ... | ... | s1 |
| 2 | TitleB | ... | ... | s3 |
| 3 | TitleC | ... | ... | s1 |
UPDATE I just learned how to use "apply" which makes it easier to address such problems.
;WITH cte AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY DocumentID ORDER BY DateCreated DESC) AS rn
FROM DocumentStatusLogs
)
SELECT *
FROM cte
WHERE rn = 1
If you expect 2 entries per day, then this will arbitrarily pick one. To get both entries for a day, use DENSE_RANK instead
As for normalised or not, it depends if you want to:
- maintain status in 2 places
- preserve status history
- ...
As it stands, you preserve status history. If you want latest status in the parent table too (which is denormalisation) you'd need a trigger to maintain "status" in the parent. or drop this status history table.
这篇关于获得每个组的前1行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!