根据条件获取行 [英] Fetch rows based on condition

查看:79
本文介绍了根据条件获取行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Amazon Redshift上使用PostgreSQL。

I am using PostgreSQL on Amazon Redshift.

我的表是:

drop table APP_Tax;
create temp table APP_Tax(APP_nm varchar(100),start timestamp,end1 timestamp);
insert into APP_Tax values('AFH','2018-01-26 00:39:51','2018-01-26 00:39:55'),
('AFH','2016-01-26 00:39:56','2016-01-26 00:40:01'),
('AFH','2016-01-26 00:40:05','2016-01-26 00:40:11'),
('AFH','2016-01-26 00:40:12','2016-01-26 00:40:15'), --row x
('AFH','2016-01-26 00:40:35','2016-01-26 00:41:34')  --row y

预期输出:

   'AFH','2016-01-26 00:39:51','2016-01-26 00:40:15'
   'AFH','2016-01-26 00:40:35','2016-01-26 00:41:34'

我不得不比较 start endtime 之间的时间间隔,如果时间差< 10秒即可获取下一条记录结束时间直到最后一条或最后一条记录。

I had to compare start and endtime between alternate records and if the timedifference < 10 seconds get the next record endtime till last or final record.

I,e datediff(seconds,2018-01-26 00:39:55,2018-01-26 00:39:56) Is <10 seconds

我尝试了此操作:

SELECT a.app_nm
    ,min(a.start)
    ,max(b.end1)
FROM APP_Tax a
INNER JOIN APP_Tax b
    ON a.APP_nm = b.APP_nm
        AND b.start > a.start
WHERE datediff(second, a.end1, b.start) < 10
GROUP BY 1

它有效,但不会返回行y 当条件失败时。

It works but it doesn't return row y when conditions fails.

推荐答案

听起来像是会话化活动事件。您可以使用 Windows函数

Sounds like what you are after is "sessionisation" of the activity events. You can achieve that in Redshift using Windows Functions.

完整的解决方案如下:

SELECT
  start AS session_start,
  session_end
FROM (
       SELECT
         start,
         end1,
         lead(end1, 1)
         OVER (
           ORDER BY end1) AS session_end,
         session_boundary
       FROM (
              SELECT
                start,
                end1,
                CASE WHEN session_switch = 0 AND reverse_session_switch = 1
                  THEN 'start'
                ELSE 'end' END AS session_boundary
              FROM (
                     SELECT
                       start,
                       end1,
                       CASE WHEN datediff(seconds, end1, lead(start, 1)
                       OVER (
                         ORDER BY end1 ASC)) > 10
                         THEN 1
                       ELSE 0 END AS session_switch,
                       CASE WHEN datediff(seconds, lead(end1, 1)
                       OVER (
                         ORDER BY end1 DESC), start) > 10
                         THEN 1
                       ELSE 0 END AS reverse_session_switch
                     FROM app_tax
                   )
                AS sessioned
              WHERE session_switch != 0 OR reverse_session_switch != 0
              UNION
              SELECT
                start,
                end1,
                'start'
              FROM (
                     SELECT
                       start,
                       end1,
                       row_number()
                       OVER (PARTITION BY APP_nm
                         ORDER BY end1 ASC) AS row_num
                     FROM APP_Tax
                   ) AS with_row_number
              WHERE row_num = 1
            ) AS with_boundary
     ) AS with_end
WHERE session_boundary = 'start'
ORDER BY start ASC
;

这是面包屑(按子查询名称):

Here is the breadkdown (by subquery name):


  1. sessioned -我们首先确定开关行(出和入),结束和开始之间的持续时间超出限制的行。

  2. with_row_number -只是提取第一行的补丁,因为其中没有切换(我们记录了一个隐式切换)作为开始)

  3. with_boundary -然后我们确定发生特定切换的行。如果您单独运行子查询,则很明显,会话在 session_switch = 0 AND reverse_session_switch = 1 时开始,并在相反的情况下结束。所有其他行都在会话的中间,因此将被忽略。

  4. with_end -最后,我们将'start的结束/开始组合在一起将'/'end'行插入(从而定义会话持续时间),然后删除结束行

  1. sessioned - we first identify the switch rows (out and in), the rows in which the duration between end and start exceeds limit.
  2. with_row_number - just a patch to extract the first row because there is no switch into it (there is an implicit switch that we record as 'start')
  3. with_boundary - then we identify the rows where specific switches occur. If you run the subquery by itself it is clear that session start when session_switch = 0 AND reverse_session_switch = 1, and ends when the opposite occurs. All other rows are in the middle of sessions so are ignored.
  4. with_end - finally, we combine the end/start of 'start'/'end' rows into (thus defining session duration), and remove the end rows

with_boundary 子查询回答了您最初的问题,但通常您希望将这些行合并以获得最终结果,即会话持续时间。

with_boundary subquery answers your initial question, but typically you'd want to combine those rows to get the final result which is the session duration.

这篇关于根据条件获取行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆