根据条件获取行 [英] Fetch rows based on condition

查看：79 发布时间：2020/5/30 1:50:32 sql postgresql amazon-redshift

本文介绍了根据条件获取行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在Amazon Redshift上使用PostgreSQL。

I am using PostgreSQL on Amazon Redshift.

我的表是：

drop table APP_Tax;
create temp table APP_Tax(APP_nm varchar(100),start timestamp,end1 timestamp);
insert into APP_Tax values('AFH','2018-01-26 00:39:51','2018-01-26 00:39:55'),
('AFH','2016-01-26 00:39:56','2016-01-26 00:40:01'),
('AFH','2016-01-26 00:40:05','2016-01-26 00:40:11'),
('AFH','2016-01-26 00:40:12','2016-01-26 00:40:15'), --row x
('AFH','2016-01-26 00:40:35','2016-01-26 00:41:34')  --row y

预期输出：

   'AFH','2016-01-26 00:39:51','2016-01-26 00:40:15'
   'AFH','2016-01-26 00:40:35','2016-01-26 00:41:34'

我不得不比较 start 和 endtime 之间的时间间隔，如果时间差< 10秒即可获取下一条记录结束时间直到最后一条或最后一条记录。

I had to compare start and endtime between alternate records and if the timedifference < 10 seconds get the next record endtime till last or final record.

I,e datediff(seconds,2018-01-26 00:39:55,2018-01-26 00:39:56) Is <10 seconds

我尝试了此操作：

SELECT a.app_nm
    ,min(a.start)
    ,max(b.end1)
FROM APP_Tax a
INNER JOIN APP_Tax b
    ON a.APP_nm = b.APP_nm
        AND b.start > a.start
WHERE datediff(second, a.end1, b.start) < 10
GROUP BY 1

它有效，但不会返回行y 当条件失败时。

It works but it doesn't return row y when conditions fails.

推荐答案

听起来像是会话化活动事件。您可以使用 Windows函数。

Sounds like what you are after is "sessionisation" of the activity events. You can achieve that in Redshift using Windows Functions.

完整的解决方案如下：

SELECT
  start AS session_start,
  session_end
FROM (
       SELECT
         start,
         end1,
         lead(end1, 1)
         OVER (
           ORDER BY end1) AS session_end,
         session_boundary
       FROM (
              SELECT
                start,
                end1,
                CASE WHEN session_switch = 0 AND reverse_session_switch = 1
                  THEN 'start'
                ELSE 'end' END AS session_boundary
              FROM (
                     SELECT
                       start,
                       end1,
                       CASE WHEN datediff(seconds, end1, lead(start, 1)
                       OVER (
                         ORDER BY end1 ASC)) > 10
                         THEN 1
                       ELSE 0 END AS session_switch,
                       CASE WHEN datediff(seconds, lead(end1, 1)
                       OVER (
                         ORDER BY end1 DESC), start) > 10
                         THEN 1
                       ELSE 0 END AS reverse_session_switch
                     FROM app_tax
                   )
                AS sessioned
              WHERE session_switch != 0 OR reverse_session_switch != 0
              UNION
              SELECT
                start,
                end1,
                'start'
              FROM (
                     SELECT
                       start,
                       end1,
                       row_number()
                       OVER (PARTITION BY APP_nm
                         ORDER BY end1 ASC) AS row_num
                     FROM APP_Tax
                   ) AS with_row_number
              WHERE row_num = 1
            ) AS with_boundary
     ) AS with_end
WHERE session_boundary = 'start'
ORDER BY start ASC
;

这是面包屑（按子查询名称）：

Here is the breadkdown (by subquery name):

sessioned -我们首先确定开关行（出和入），结束和开始之间的持续时间超出限制的行。

with_row_number -只是提取第一行的补丁，因为其中没有切换（我们记录了一个隐式切换）作为开始）

with_boundary -然后我们确定发生特定切换的行。如果您单独运行子查询，则很明显，会话在 session_switch = 0 AND reverse_session_switch = 1 时开始，并在相反的情况下结束。所有其他行都在会话的中间，因此将被忽略。

with_end -最后，我们将'start的结束/开始组合在一起将'/'end'行插入（从而定义会话持续时间），然后删除结束行

sessioned - we first identify the switch rows (out and in), the rows in which the duration between end and start exceeds limit.
with_row_number - just a patch to extract the first row because there is no switch into it (there is an implicit switch that we record as 'start')
with_boundary - then we identify the rows where specific switches occur. If you run the subquery by itself it is clear that session start when session_switch = 0 AND reverse_session_switch = 1, and ends when the opposite occurs. All other rows are in the middle of sessions so are ignored.
with_end - finally, we combine the end/start of 'start'/'end' rows into (thus defining session duration), and remove the end rows

with_boundary 子查询回答了您最初的问题，但通常您希望将这些行合并以获得最终结果，即会话持续时间。

with_boundary subquery answers your initial question, but typically you'd want to combine those rows to get the final result which is the session duration.

这篇关于根据条件获取行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

根据条件获取行 [英] Fetch rows based on condition

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

根据条件获取行 [英] Fetch rows based on condition

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭