当偏移量对于可以重复处理的项目而言是可变的时,我该如何抓住“下一个"事件? [英] How do I grab the “next” event when the offset is variable for items that can be repeatedly processed?

查看:79
本文介绍了当偏移量对于可以重复处理的项目而言是可变的时,我该如何抓住“下一个"事件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题实际上与我最近问过的另一个问题相同,但非常重要的区别是这些交易是贷款交易, ,因此,项目可能会多次出现在数据中.这就是为什么我目前使用LEAD的原因.经过澄清后,我在下面重新发布了我的问题.

This question is virtually identical to another I recently asked, with the very important distinction that these transactions are loan transactions and, therefore, items may reappear in the data multiple times. This is why I'm currently using LEAD. With that clarification, I repost my question below.

我在Oracle数据库中有一个交易表.我正在尝试为涉及多种交易类型的交付系统收集一份报告.实际上,请求"类型可以是四个子类型之一(在此示例中为"A","B","C"和"D"),而传递"类型可以是四个不同子类型之一类型("PULL","PICKUP","MAIL").从请求"到交货"之间可以有1到5笔交易,而且交付"类型中的许多也是中间交易.例如:

I have a table of transactions in an Oracle database. I am attempting to pull a report together for a delivery system involving a number of transaction types. The "request" type can actually be one of four sub-types ('A', 'B', 'C', and 'D' for this example), and the "delivery" type can be one of four different sub-types ('PULL', 'PICKUP', 'MAIL'). There can be anywhere from 1 to 5 transactions to get an item from "request" to "delivery, and a number of the "delivery" types are also intermediary transactions. Example:

Item | Transaction | Timestamp
001  | REQ-A       | 2014-07-31T09:51:32Z
002  | REQ-B       | 2014-07-31T09:55:53Z
003  | REQ-C       | 2014-07-31T10:01:15Z
004  | REQ-D       | 2014-07-31T10:02:29Z
005  | REQ-A       | 2014-07-31T10:05:47Z
002  | PULL        | 2014-07-31T10:20:04Z
002  | MAIL        | 2014-07-31T10:20:06Z
001  | PULL        | 2014-07-31T10:22:21Z
001  | TRANSFER    | 2014-07-31T10:22:23Z
003  | PULL        | 2014-07-31T10:24:10Z
003  | TRANSFER    | 2014-07-31T10:24:12Z
004  | PULL        | 2014-07-31T10:26:28Z
005  | PULL        | 2014-07-31T10:28:42Z
005  | TRANSFER    | 2014-07-31T10:28:44Z
001  | ARRIVE      | 2014-07-31T11:45:01Z
001  | PICKUP      | 2014-07-31T11:45:02Z
003  | ARRIVE      | 2014-07-31T11:47:44Z
003  | PICKUP      | 2014-07-31T11:47:45Z
005  | ARRIVE      | 2014-07-31T11:49:45Z
005  | PICKUP      | 2014-07-31T11:49:46Z

我需要的是这样的报告:

What I need is a report like:

Item | Start Tx | End Tx | Time
001  | REQ-A    | PICKUP | 1:53:30
002  | REQ-B    | MAIL   | 0:24:13
003  | REQ-C    | PICKUP | 1:46:30
004  | REQ-D    | PULL   | 0:23:59
005  | REQ-A    | PICKUP | 1:43:59

我所拥有的:

Item | Start Tx | End Tx   | Time
001  | REQ-A    | PULL     | 0:30:49
001  | REQ-A    | TRANSFER | 0:30:51
001  | REQ-A    | ARRIVE   | 1:53:29
001  | REQ-A    | PICKUP   | 1:53:30
002  | REQ-B    | PULL     | 0:24:11
002  | REQ-B    | MAIL     | 0:24:13
003  | REQ-C    | PULL     | 0:22:55
003  | REQ-C    | TRANSFER | 0:22:57
003  | REQ-C    | ARRIVE   | 1:46:29
003  | REQ-C    | PICKUP   | 1:46:30
004  | REQ-D    | PULL     | 0:23:59
005  | REQ-A    | PULL     | 0:22:55
005  | REQ-A    | TRANSFER | 0:22:57
005  | REQ-A    | ARRIVE   | 1:43:58
005  | REQ-A    | PICKUP   | 1:43:59

我正在做什么以获取数据:

What I'm doing to get that data:

SELECT Item, Transaction, nextTransaction, nextTimestamp - Timestamp
FROM (
    SELECT Item, Transaction, Timestamp,
      LEAD(Transaction, 5) OVER (PARTITION BY Item ORDER BY Timestamp) AS "nextTransaction"
      LEAD(Timestamp, 5) OVER (PARTITION BY Item ORDER BY Timestamp) AS "nextTimestamp"
    FROM Transactions
    UNION ALL 
    SELECT Item, Transaction, Timestamp,
      LEAD(Transaction, 4) OVER (PARTITION BY Item ORDER BY Timestamp) AS "nextTransaction"
      LEAD(Timestamp, 4) OVER (PARTITION BY Item ORDER BY Timestamp) AS "nextTimestamp"
    FROM Transactions
    UNION ALL 
    SELECT Item, Transaction, Timestamp,
      LEAD(Transaction, 3) OVER (PARTITION BY Item ORDER BY Timestamp) AS "nextTransaction"
      LEAD(Timestamp, 3) OVER (PARTITION BY Item ORDER BY Timestamp) AS "nextTimestamp"
    FROM Transactions
    UNION ALL 
    SELECT Item, Transaction, Timestamp,
      LEAD(Transaction, 2) OVER (PARTITION BY Item ORDER BY Timestamp) AS "nextTransaction"
      LEAD(Timestamp, 2) OVER (PARTITION BY Item ORDER BY Timestamp) AS "nextTimestamp"
    FROM Transactions
    UNION ALL 
    SELECT Item, Transaction, Timestamp,
      LEAD(Transaction, 1) OVER (PARTITION BY Item ORDER BY Timestamp) AS "nextTransaction"
      LEAD(Timestamp, 1) OVER (PARTITION BY Item ORDER BY Timestamp) AS "nextTimestamp"
    FROM Transactions
)
WHERE nextTransaction IS NOT NULL
AND Transaction IN ('REQ-A', 'REQ-B', 'REQ-C', 'REQ-D')

我可以在脚本中手动解析它(也许这实际上是最好的做法),但是为了学习起见,我想知道是否有可能单独使用SQL确实可以做到这一点.

I could manually parse this in a script (and perhaps that's actually the best course of action), but for the sake of learning, I'd like to know if it's possible to actually do this with SQL alone.

为弄清贷款"位,此表中还有其他与收益和其他形式无关的交易,这些交易与该报告无关,而与其他交易类型不同.退回商品后,它可以再次经历请求周期.例如,对于项目001,它可以遵循项目002的周期(REQ-> MAIL),然后可以进行不上架"交易,或者可以是非请求贷款,也可以是一些其他用例.然后可以返回REQ-> PICKUP周期,或REQ-> PULL周期.

To clarify the "loan" bit, there are other transactions in this table for returns and other forms of processing that are irrelevant to this report beyond existing as other transaction types. Once an item is returned, it can go through the request cycle again. As an example, for item 001, it could then follow item 002's cycle (REQ -> MAIL), it could then get a "Not on shelf" transaction, or a non-request loan, or a few other use cases. It could then go back through the REQ -> PICKUP cycle, or the REQ->PULL cycle.

推荐答案

这是一个空白问题,但是REQ事务定义的孤岛使其变得比某些复杂.

This is a gaps-and-islands problem, but the islands being defined by a REQ transaction make it a bit more complicated than some.

您可以使用嵌套的超前和滞后函数以及一些操作来获得所需的内容:

You could use nested lead and lag functions and some manipulation to get what you need:

select distinct item,
  coalesce(start_tran,
    lag(start_tran) over (partition by item order by timestamp)) as start_tran,
  coalesce(end_tran,
    lead(end_tran) over (partition by item order by timestamp)) as end_tran,
  coalesce(end_time, 
    lead(end_time) over (partition by item order by timestamp))
    - coalesce(start_time,
        lag(start_time) over (partition by item order by timestamp)) as time
from (
  select item, timestamp, start_tran, start_time, end_tran, end_time
  from (
    select item,
      timestamp,
      case when lag_tran is null or transaction like 'REQ%'
        then transaction end as start_tran,
      case when lag_tran is null or transaction like 'REQ%'
        then timestamp end as start_time,
      case when lead_tran is null or lead_tran like 'REQ%'
        then transaction end as end_tran,
      case when lead_tran is null or lead_tran like 'REQ%'
        then timestamp end as end_time
    from (
      select item, transaction, timestamp,
        lag(transaction)
          over (partition by item order by timestamp) as lag_tran,
        lead(transaction)
          over (partition by item order by timestamp) as lead_tran
      from transactions
    )
  )
  where start_tran is not null or end_tran is not null
)
order by item, start_tran;

具有第二个周期的项目1和2的其他记录,这些记录可能会给出:

With additional records for a second cycle for items 1 and 2 that could give:

      ITEM START_TRAN END_TRAN   TIME      
---------- ---------- ---------- -----------
         1 REQ-A      PICKUP     0 1:53:30.0 
         1 REQ-E      PICKUP     0 1:23:30.0 
         2 REQ-B      MAIL       0 0:24:13.0 
         2 REQ-F      REQ-F      0 0:0:0.0   
         3 REQ-C      PICKUP     0 1:46:30.0 
         4 REQ-D      PULL       0 0:23:59.0 
         5 REQ-A      PICKUP     0 1:43:59.0 

SQL小提琴显示所有中间步骤.

它并不像乍看上去那样可怕.最里面的查询将获取原始数据,并为超前和滞后事务添加一个额外的列.仅采用第一组item-1记录,即:

It's not quite as scary as it might look at first glance. The innermost query takes the raw data and adds an extra column for the lead and lag transactions. Taking just the first set of item-1 records that would be:

      ITEM TRANSACTION TIMESTAMP                LAG_TRAN   LEAD_TRAN
---------- ----------- ------------------------ ---------- ----------
         1 REQ-A       2014-07-31T09:51:32Z                PULL       
         1 PULL        2014-07-31T10:22:21Z     REQ-A      TRANSFER   
         1 TRANSFER    2014-07-31T10:22:23Z     PULL       ARRIVE     
         1 ARRIVE      2014-07-31T11:45:01Z     TRANSFER   PICKUP     
         1 PICKUP      2014-07-31T11:45:02Z     ARRIVE     REQ-E      

通知REQ-E作为最后一个lead_tran弹出吗?这是该项目的第二个记录周期的第一个transaction,以后将很有用.下一级别的查询将使用这些超前和滞后值,并将REQ值视为开始和结束标记,并使用该信息使每个循环中除第一个和最后一个记录以外的所有内容无效.

Notice REQ-E popping up as the last lead_tran? That's the first transaction for the second cycle of records for this item, and is going to be useful later. The next level of query uses those lead and lag values and treats REQ values as start and end markers, and uses that information to null out everything except the first and last record for each cycle.

      ITEM TIMESTAMP                START_TRAN START_TIME               END_TRAN   END_TIME               
---------- ------------------------ ---------- ------------------------ ---------- ------------------------
         1 2014-07-31T09:51:32Z     REQ-A      2014-07-31T09:51:32Z                                         
         1 2014-07-31T10:22:21Z                                                                             
         1 2014-07-31T10:22:23Z                                                                             
         1 2014-07-31T11:45:01Z                                                                             
         1 2014-07-31T11:45:02Z                                         PICKUP     2014-07-31T11:45:02Z     

由于我们对它们不感兴趣,因此下一级查询将删除所有不代表开始或结束的行(或都不代表开始或结束的行-参见小提琴中的REQ-F).

The next level of query removes any rows which are not representing the start or end (or both - see REQ-F in the Fiddle) as we aren't interested in them:

      ITEM TIMESTAMP                START_TRAN START_TIME               END_TRAN   END_TIME               
---------- ------------------------ ---------- ------------------------ ---------- ------------------------
         1 2014-07-31T09:51:32Z     REQ-A      2014-07-31T09:51:32Z                                         
         1 2014-07-31T11:45:02Z                                         PICKUP     2014-07-31T11:45:02Z     

我们现在每个周期都有成对的行(或者对于REQ-F是一行).最终级别再次使用超前和滞后来填补空白;如果start_tran为null,则这是结束行,我们应该使用上一行的开始数据;否则,如果end_tran为null,则这是一个开始行,我们应该使用下一行的结束数据.

We now have pairs of rows for each cycle (or a single row for REQ-F). The final level uses lead and lag again to fill in the blanks; if the start_tran is null then this is an end-row and we should use the previous row's start data; if end_tran is null then this is a start-row and we should use the next row's end data.

  ITEM START_TRAN START_TIME               END_TRAN   END_TIME                 TIME      


     1 REQ-A      2014-07-31T09:51:32Z     PICKUP     2014-07-31T11:45:02Z     0 1:53:30.0 
     1 REQ-A      2014-07-31T09:51:32Z     PICKUP     2014-07-31T11:45:02Z     0 1:53:30.0 

这使两行相同,因此distinct删除重复项.

That makes both rows the same, so the distinct removes the duplicates.

这篇关于当偏移量对于可以重复处理的项目而言是可变的时,我该如何抓住“下一个"事件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆