如何获取每种行类型的最小最大日期 [英] How to get min max date per row type
问题描述
场景:一家公司在许多州都设有分支机构.一个州可能有多个分支.每当员工从一个分支转移到另一个分支时,都会在表中创建如下条目
Scenario: A company has many branches in many states. A state may have more than one branch. Whenever an employee is transferred from one branch to another, an entry is made to a table like following
| EID | DT | BRANCH | STATE |
|-----|-------------|--------|-------|
| 1 | 01-JAN-2000 | A | AA |
| 1 | 01-JAN-2001 | B | AA |
| 1 | 01-JAN-2002 | C | AA |
| 1 | 01-JAN-2003 | D | AA |
| 1 | 01-JAN-2004 | E | BB |
| 1 | 01-JAN-2005 | F | BB |
| 1 | 01-JAN-2006 | G | BB |
| 1 | 01-JAN-2007 | H | BB |
| 1 | 01-JAN-2008 | A | AA |
| 1 | 01-JAN-2009 | B | AA |
| 1 | 01-JAN-2010 | C | AA |
| 1 | 01-JAN-2011 | D | AA |
要求是找出员工处于某种状态的持续时间.输出应该是这样的
The requirement is to find out the duration for which a employee has been in a certain state. the output should be something like this
| STATE | MIN | MAX | Duration |
|-------|-------------|-------------|-------------|
| AA | 01-JAN-2000 | 01-JAN-2003 | 3 |
| BB | 01-JAN-2004 | 01-JAN-2007 | 3 |
| AA | 01-JAN-2008 | 01-JAN-2011 | 3 |
我似乎无法弄清楚如何在PL/SQL中做到这一点.长的方法是使用for循环遍历每一行并找到持续时间.但是有没有一种方法可以在PLSQL中不使用循环呢?
I can't seem to figure out how to do it in PL/SQL. The long way would be to use a for loop to traverse through each row and find the duration. But is there a way to do it in PLSQL without using loops?
这是一个SQLFiddle 演示
here's a SQLFiddle Demo
推荐答案
以下是完成此操作的方法之一:
Here is one of the approaches to get it done:
select max(z.state) as state
, min(z.dt) as min_date /* main query */
, max(z.dt) as max_date
, trunc((max(z.dt) - min(z.dt)) / 365) as duaration
from (select q.eid
, q.dt /* query # 2*/
, state
, sum(grp) over(order by q.dt) as grp
from (select eid
, dt
, state /* query # 1*/
, case
when state <> lag(state) over(order by dt)
then 1
end as grp
from t1 ) q
) z
group by z.grp
结果:
STATE MIN_DATE MAX_DATE DUARATION
----- ----------- ----------- ----------
AA 01-JAN-00 01-JAN-03 3
BB 01-JAN-04 01-JAN-07 3
AA 01-JAN-08 01-JAN-11 3
附录1 :查询说明.
为了获得最小和最大日期,我们只需要应用group by
子句,这很明显,但是我们不能,因为在BB
之前的AA
状态和/* query # 1*/
)和/* query # 2*/
所做的.查询1查找状态更改时的时刻(将当前行state
与上一行进行比较.lag() over()
函数用于引用数据集中的上一行),而查询2通过计算运行来形成逻辑组总计grp
(sum() over()
分析功能负责).
In order to get minimum and maximum date we simply have to apply group by
clause, it's obvious, but we can't, because there is a logical difference between AA
state before BB
and one after BB
state. So we have to do something to separate them, put them into different logical groups. And that's what inner-most (/* query # 1*/
) and /* query # 2*/
do. The query #1 finds moments when state changes(compare current-row state
with the previous one. lag() over()
function is used to reference previous row in the data set), and query #2 is forming a logical group by calculating running total of grp
(sum() over()
analytic function is responsible for that).
查询#1给我们:
EID DT STATE GRP
---------- ----------- ----- ----------
1 01-JAN-2000 AA
1 01-JAN-2001 AA
1 01-JAN-2002 AA
1 01-JAN-2003 AA
1 01-JAN-2004 BB 1 --<-- moment when state changes
1 01-JAN-2005 BB
1 01-JAN-2006 BB
1 01-JAN-2007 BB
1 01-JAN-2008 AA 1 --<-- moment when state changes
1 01-JAN-2009 AA
1 01-JAN-2010 AA
1 01-JAN-2011 AA
查询#2形成逻辑组:
EID DT STATE GRP
---------- ----------- ----- ----------
1 01-JAN-2000 AA
1 01-JAN-2001 AA
1 01-JAN-2002 AA
1 01-JAN-2003 AA
1 01-JAN-2004 BB 1
1 01-JAN-2005 BB 1
1 01-JAN-2006 BB 1
1 01-JAN-2007 BB 1
1 01-JAN-2008 AA 2
1 01-JAN-2009 AA 2
1 01-JAN-2010 AA 2
1 01-JAN-2011 AA 2
然后,在主查询中,我们只是按GRP
分组以产生最终输出.
Then, in main query, we are simply grouping by GRP
to produce final output.
这篇关于如何获取每种行类型的最小最大日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!