Hive 查询中的临时连接(时间上很接近的事件) [英] Temporal Join in Hive query (events in close proximity in time)

查看:33
本文介绍了Hive 查询中的临时连接(时间上很接近的事件)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要一个难以弄清楚的 hive 查询.

I have a need for a hive query that I'm having difficulty figuring out.

我有一个如下所示的时间序列:

I have a time series that looks like this:

time                          source    word1   word2    ...etc
2012-02-01 23:43:16.9988243    0001      2B3B    FAF0
2012-02-01 23:43:16.9993561    0002      2326    ABAA
2012-02-01 23:43:16.9998879    0002      2327    ABAA

我需要这样一个查询,如果一个source中的记录满足特定条件,除了该记录外,它还应该返回一个或多个时间接近的记录 来自第二个 source 满足一组不同的条件.

I need a query such that, if a record in one source fulfills specific conditions, in addition to that record, it should also return one or more records in close proximity in time from the second source fulfilling a different set of conditions.

到目前为止我的尝试是这样的:

My attempt so far looks like this:

SELECT time
    FROM messages C
    JOIN messages D on
       D.time
           BETWEEN C.time - INTERVAL '0.001' SECOND
           AND     C.time + INTERVAL '0.001' SECOND

    WHERE C.source = '0001'
    AND   D.Source = '0002'
    AND   C.word1 = '2B3B'
    AND   D.word2 = 'ABAA'

哪个应该返回上面示例数据中的第一条和第二条记录(它不应该返回第三条,因为时间比 0.001 秒更远).

Which should return the first and second records in the sample data above (It shouldn't return the third, because the time is farther away than .001 seconds).

但是查询不起作用.错误信息是

But the query doesn't work. The error message is

FAILED: SemanticException '0.001' encountered with 0 children

推荐答案

这将是一个天真的解决方案:

This would be a naive solution:

select  *

from                    messages c
        cross join      messages m 

where   m.time  between c.time - interval '0.001' second
                and     c.time + interval '0.001' second
                
    and c.word1 = '2B3B'
    and m.word2 = 'ABAA'
    
;


+----------------------------+--------+-------+-------+----------------------------+--------+-------+-------+
|            time            | source | word1 | word2 |            time            | source | word1 | word2 |
+----------------------------+--------+-------+-------+----------------------------+--------+-------+-------+
| 2012-02-01 23:43:16.998824 |   0001 | 2B3B  | FAF0  | 2012-02-01 23:43:16.999356 |   0002 |  2326 | ABAA  |
+----------------------------+--------+-------+-------+----------------------------+--------+-------+-------+


这是性能良好的解决方案


This is the solution with the good performance

select  *

from                    messages c

        join            messages m
        
        on              floor (cast(c.time as decimal(37,7)) / (2 * 0.001))   =
                        floor (cast(m.time as decimal(37,7)) / (2 * 0.001))

where   m.time  between c.time - interval '0.001' second
                and     c.time + interval '0.001' second
                
    and c.word1 = '2B3B'
    and m.word2 = 'ABAA'
    
union all

select  *

from                    messages c

        join            messages m
        
        on              floor ((cast(c.time as decimal(37,7)) + 0.001) / (2 * 0.001))   =
                        floor ((cast(m.time as decimal(37,7)) + 0.001) / (2 * 0.001))

where   floor (cast(c.time as decimal(37,7)) / (2 * 0.001))     <>
        floor (cast(m.time as decimal(37,7)) / (2 * 0.001))
        
    and m.time  between c.time - interval '0.001' second
                and     c.time + interval '0.001' second
                
    and c.word1 = '2B3B'
    and m.word2 = 'ABAA'
    
    


+----------------------------+--------+-------+-------+----------------------------+-------+-------+-------+
|            time            | source | word1 | word2 |           _col4            | _col5 | _col6 | _col7 |
+----------------------------+--------+-------+-------+----------------------------+-------+-------+-------+
| 2012-02-01 23:43:16.998824 |   0001 | 2B3B  | FAF0  | 2012-02-01 23:43:16.999356 |  0002 |  2326 | ABAA  |
+----------------------------+--------+-------+-------+----------------------------+-------+-------+-------+

插图

事件 A 和 B 将被 UNION ALL 的上半部分捕获.
事件 B 和 C 将被 UNION ALL 的下半部分捕获.

Illustration

Events A and B are going to be caught by the upper part of the UNION ALL.
Events B and C are going to be caught by the lower part of the UNION ALL.

    0        0.002    0.004    0.006    0.008    0.01      
    |        |        |        |        |        |
-------------------------------------------------------
                      |        |
                      |        |
                          A  B  C
                           |        |
                           |        |
-------------------------------------------------------
         |        |        |        |        |                
         0.001    0.003    0.005    0.007    0.009

这篇关于Hive 查询中的临时连接(时间上很接近的事件)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆