获取信封,即重叠的时间跨度 [英] Get envelope.i.e overlapping time spans

查看:101
本文介绍了获取信封,即重叠的时间跨度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有这样的在线会话的表(空行只是为了更好地查看):

I have a table with online sessions like this (empty rows are just for better visibility):

ip_address  | start_time       | stop_time
------------|------------------|------------------
10.10.10.10 | 2016-04-02 08:00 | 2016-04-02 08:12
10.10.10.10 | 2016-04-02 08:11 | 2016-04-02 08:20

10.10.10.10 | 2016-04-02 09:00 | 2016-04-02 09:10
10.10.10.10 | 2016-04-02 09:05 | 2016-04-02 09:08
10.10.10.10 | 2016-04-02 09:05 | 2016-04-02 09:11
10.10.10.10 | 2016-04-02 09:02 | 2016-04-02 09:15
10.10.10.10 | 2016-04-02 09:10 | 2016-04-02 09:12

10.66.44.22 | 2016-04-02 08:05 | 2016-04-02 08:07
10.66.44.22 | 2016-04-02 08:03 | 2016-04-02 08:11

我需要包围"在线时间跨度:

And I need the "envelop" online time spans:

ip_address  | full_start_time  | full_stop_time
------------|------------------|------------------
10.10.10.10 | 2016-04-02 08:00 | 2016-04-02 08:20
10.10.10.10 | 2016-04-02 09:00 | 2016-04-02 09:15
10.66.44.22 | 2016-04-02 08:03 | 2016-04-02 08:11

我有此查询可返回所需结果:

I have this query which returns desired result:

WITH t AS 
    -- Determine full time-range of each IP
    (SELECT ip_address, MIN(start_time) AS min_start_time, MAX(stop_time) AS max_stop_time FROM IP_SESSIONS GROUP BY ip_address),
t2 AS
    -- compose ticks
    (SELECT DISTINCT ip_address, min_start_time + (LEVEL-1) * INTERVAL '1' MINUTE AS ts
    FROM t
    CONNECT BY min_start_time + (LEVEL-1) * INTERVAL '1' MINUTE <= max_stop_time),
t3 AS 
    -- get all "online" ticks
    (SELECT DISTINCT ip_address, ts
    FROM t2
        JOIN IP_SESSIONS USING (ip_address)
    WHERE ts BETWEEN start_time AND stop_time),
t4 AS
    (SELECT ip_address, ts,
        LAG(ts) OVER (PARTITION BY ip_address ORDER BY ts) AS previous_ts
    FROM t3),
t5 AS 
    (SELECT ip_address, ts, 
        SUM(DECODE(previous_ts,NULL,1,0 + (CASE WHEN previous_ts + INTERVAL '1' MINUTE <> ts THEN 1 ELSE 0 END))) 
            OVER (PARTITION BY ip_address ORDER BY ts ROWS UNBOUNDED PRECEDING) session_no
    FROM t4)
SELECT ip_address, MIN(ts) AS full_start_time, MAX(ts) AS full_stop_time
FROM t5
GROUP BY ip_address, session_no
ORDER BY 1,2;

但是,我对性能感到担心.该表具有数亿行,时间分辨率为毫秒(如示例中所示,不是一分钟).因此CTE t3将会很大.是否有人有避免自我加入和"CONNECT BY"的解决方案?

However, I am concerned about the performance. The table has hundreds of million rows and the time resolution is millisecond (not one Minute as given in example). Thus CTE t3 is gonna be huge. Does anybody have a solution which avoids the Self-Join and "CONNECT BY"?

单个智能分析函数将很棒.

推荐答案

也尝试此方法.我尽我所能进行了测试,我相信它涵盖了所有可能性,包括合并相邻的时间间隔(10:15至10:30和10:30至10:40合并为一个时间间隔,即10:15至10:40 ).它也应该非常快,不会用太多.

Try this one, too. I tested it the best I could, I believe it covers all the possibilities, including coalescing adjacent intervals (10:15 to 10:30 and 10:30 to 10:40 are combined into a single interval, 10:15 to 10:40). It should also be quite fast, it doesn't use much.

with m as
        (
         select ip_address, start_time,
                   max(stop_time) over (partition by ip_address order by start_time 
                             rows between unbounded preceding and 1 preceding) as m_time
         from ip_sessions
         union all
         select ip_address, NULL, max(stop_time) from ip_sessions group by ip_address
        ),
     n as
        (
         select ip_address, start_time, m_time 
         from m 
         where start_time > m_time or start_time is null or m_time is null
        ),
     f as
        (
         select ip_address, start_time,
            lead(m_time) over (partition by ip_address order by start_time) as stop_time
         from n
        )
select * from f where start_time is not null
/

这篇关于获取信封,即重叠的时间跨度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆