Temporal加入Hive查询(时间紧邻的事件) [英] Temporal Join in Hive query (events in close proximity in time)

查看:338
本文介绍了Temporal加入Hive查询(时间紧邻的事件)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



我有一个如下所示的时间序列:

 时间源word1 word2 ...等
2012-02-01 23:43:16.9988243 0001 2B3B FAF0
2012- 02-01 23:43:16.9993561 0002 2326 ABAA
2012-02-01 23:43:16.9998879 0002 2327 ABAA

我需要一个查询,如果一个 source 中的记录满足特定条件,除了该记录外,还应该返回一个或多个记录与第二个在时间上接近以满足不同的条件集。



我迄今为止的尝试如下所示:

  SELECT time 
FROM消息C
JOIN消息D在
D.time
之间C.time - INTERVAL'0.001'SECOND
和C.time + INTERVAL'0.001'SE COND

WHERE C.source ='0001'
AND D.Source ='0002'
AND C.word1 ='2B3B'
AND D.word2 = 'ABAA'

应该返回上面示例数据中的第一条和第二条记录(它不应该返回第三个,因为时间比.001秒更远)。



但是查询不起作用。错误消息是:

  FAILED:SemanticException'0.001'遇到0个孩子


解决方案

这是一个天真的解决方案:

<$ p $从消息中选择*

c
交叉连接消息m

其中m.time在c.time - interval'0.001'之间第二个
和c.time +间隔'0.001'秒

和c.word1 ='2B3B'
和m.word2 ='ABAA'

;






  + ---------------------------- + -------- + ------- + ---- --- + ---------------------------- + -------- + ------- + ------- + 
|时间|来源| word1 | word2 |时间|来源| word1 | word2 |
+ ---------------------------- + -------- + ------- + ------- + ---------------------------- + -------- + --- ---- + ------- +
| 2012-02-01 23:43:16.998824 | 0001 | 2B3B | FAF0 | 2012-02-01 23:43:16.999356 | 0002 | 2326 | ABAA |
+ ---------------------------- + -------- + ------- + ------- + ---------------------------- + -------- + --- ---- + ------- +






这是性能良好的解决方案。

  select * 

from messages c

在楼上加入消息m

(cast(c.time为小数点(37,7))/(2 * 0.001))=
floor(cast(m .time作为十进制(37,7))/(2 * 0.001))

其中m.time在c.time - interval'0.001'second
和c.time + 0.001'秒

和c.word1 ='2B3B'
和m.word2 ='ABAA'

联合所有

选择*

来自消息c

加入消息m

在楼层((cast(c.time如小数点(37,7))+ 0.001 )/(2 * 0.001))=
floor((cas t(m.time为十进制(37,7))+ 0.001)/(2 * 0.001))

where floor(cast(c.time as decimal(37,7))/(2 * 0.001))<>
floor(演员(m.time为小数点(37,7))/(2 * 0.001))

和m.time之间c.time - 间隔'0.001'秒
和c.time +间隔'0.001'秒

和c.word1 ='2B3B'
和m.word2 ='ABAA'






  + ---------- ------------------ + -------- + ------- ------- + ------ + ---------------------- + ------- + ------- + ------- + 
|时间|来源| word1 | word2 | _col4 | _col5 | _col6 | _col7 |
+ ---------------------------- + -------- + ------- + ------- + ---------------------------- + ------- + ---- --- + ------- +
| 2012-02-01 23:43:16.998824 | 0001 | 2B3B | FAF0 | 2012-02-01 23:43:16.999356 | 0002 | 2326 | ABAA |
+ ---------------------------- + -------- + ------- + ------- + ---------------------------- + ------- + ---- --- + ------- +



插图



活动A和B将被UNION ALL的上部抓住。

活动B和C将被UNION ALL的下部抓住。

  0 0.002 0.004 0.006 0.008 0.01 
| | | | | |
---------------------------------------------- ---------
| |
| |
A B C
| |
| |
---------------------------------------------- ---------
| | | | |
0.001 0.003 0.005 0.007 0.009


I have a need for a hive query that I'm having difficulty figuring out.

I have a time series that looks like this:

time                          source    word1   word2    ...etc
2012-02-01 23:43:16.9988243    0001      2B3B    FAF0
2012-02-01 23:43:16.9993561    0002      2326    ABAA
2012-02-01 23:43:16.9998879    0002      2327    ABAA

I need a query such that, if a record in one source fulfills specific conditions, in addition to that record, it should also return one or more records in close proximity in time from the second source fulfilling a different set of conditions.

My attempt so far looks like this:

SELECT time
    FROM messages C
    JOIN messages D on
       D.time
           BETWEEN C.time - INTERVAL '0.001' SECOND
           AND     C.time + INTERVAL '0.001' SECOND

    WHERE C.source = '0001'
    AND   D.Source = '0002'
    AND   C.word1 = '2B3B'
    AND   D.word2 = 'ABAA'

Which should return the first and second records in the sample data above (It shouldn't return the third, because the time is farther away than .001 seconds).

But the query doesn't work. The error message is

FAILED: SemanticException '0.001' encountered with 0 children

解决方案

This would be a naive solution:

select  *

from                    messages c
        cross join      messages m 

where   m.time  between c.time - interval '0.001' second
                and     c.time + interval '0.001' second

    and c.word1 = '2B3B'
    and m.word2 = 'ABAA'

;


+----------------------------+--------+-------+-------+----------------------------+--------+-------+-------+
|            time            | source | word1 | word2 |            time            | source | word1 | word2 |
+----------------------------+--------+-------+-------+----------------------------+--------+-------+-------+
| 2012-02-01 23:43:16.998824 |   0001 | 2B3B  | FAF0  | 2012-02-01 23:43:16.999356 |   0002 |  2326 | ABAA  |
+----------------------------+--------+-------+-------+----------------------------+--------+-------+-------+


This is the solution with the good performance

select  *

from                    messages c

        join            messages m

        on              floor (cast(c.time as decimal(37,7)) / (2 * 0.001))   =
                        floor (cast(m.time as decimal(37,7)) / (2 * 0.001))

where   m.time  between c.time - interval '0.001' second
                and     c.time + interval '0.001' second

    and c.word1 = '2B3B'
    and m.word2 = 'ABAA'

union all

select  *

from                    messages c

        join            messages m

        on              floor ((cast(c.time as decimal(37,7)) + 0.001) / (2 * 0.001))   =
                        floor ((cast(m.time as decimal(37,7)) + 0.001) / (2 * 0.001))

where   floor (cast(c.time as decimal(37,7)) / (2 * 0.001))     <>
        floor (cast(m.time as decimal(37,7)) / (2 * 0.001))

    and m.time  between c.time - interval '0.001' second
                and     c.time + interval '0.001' second

    and c.word1 = '2B3B'
    and m.word2 = 'ABAA'


+----------------------------+--------+-------+-------+----------------------------+-------+-------+-------+
|            time            | source | word1 | word2 |           _col4            | _col5 | _col6 | _col7 |
+----------------------------+--------+-------+-------+----------------------------+-------+-------+-------+
| 2012-02-01 23:43:16.998824 |   0001 | 2B3B  | FAF0  | 2012-02-01 23:43:16.999356 |  0002 |  2326 | ABAA  |
+----------------------------+--------+-------+-------+----------------------------+-------+-------+-------+

Illustration

Events A and B are going to be caught by the upper part of the UNION ALL.
Events B and C are going to be caught by the lower part of the UNION ALL.

    0        0.002    0.004    0.006    0.008    0.01      
    |        |        |        |        |        |
-------------------------------------------------------
                      |        |
                      |        |
                          A  B  C
                           |        |
                           |        |
-------------------------------------------------------
         |        |        |        |        |                
         0.001    0.003    0.005    0.007    0.009

这篇关于Temporal加入Hive查询(时间紧邻的事件)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆