在Snowflake的公用表表达式中使用&Match_Recognition& [英] Using "match_recognize" in a Common Table Expression in Snowflake

查看:15
本文介绍了在Snowflake的公用表表达式中使用&Match_Recognition&的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

更新:已回复here

我正在组合一个有点复杂的查询,以便在Snowflake中对大型时间序列数据集执行事件检测、连接和基于时间的绑定。我最近注意到,match_recognize使我能够雄辩地检测时间序列事件,但每当我尝试在公用表表达式(with .. as ..)中使用match_recognize表达式时,我收到以下错误:

SQL编译错误:此上下文不支持Match_Recognition。

我做了很多搜索/阅读,但没有发现CTE中match_recognize的任何文档限制。以下是我的问题:

with clean_data as (
    -- Remove duplicate entries
    select distinct id, timestamp, measurement
    from dataset
),

label_events as (
    select *
    from clean_data
        match_recognize (
            partition by id
            order by timestamp
            measures
                match_number() as event_number
            all rows per match
            after match skip past last row
            pattern(any_row row_between_gaps+)
            define
                -- Classify contiguous sections of datapoints with < 20min between adjacent points.
                row_between_gaps as datediff(minute, lag(timestamp), timestamp) < 20
        )
)

-- Do binning with width_bucket/etc. here
select id, timestamp, measurement, event_number
from label_events;

我收到了与上面相同的错误。

这是我没有看到的限制,还是我做错了什么?

推荐答案

来自菲利普·霍法的评论:MATCH_RECOGNIZE with CTE in Snowflake

这在当时似乎是Snowflake的一个未记录在案的限制。两个或三个步骤的解决方案对我很有效:

with clean_data as (
    -- Remove duplicate entries
    select distinct id, timestamp, measurement
    from dataset
)

select *
from clean_data
    match_recognize (
        partition by id
        order by timestamp
        measures
            match_number() as event_number
        all rows per match
        after match skip past last row
        pattern(any_row row_between_gaps+)
        define
            -- Classify contiguous sections of datapoints with < 20min between adjacent points.
            row_between_gaps as datediff(minute, lag(timestamp), timestamp) < 20
    );

set quid=last_query_id();

with label_events as (
    select *
    from table(result_scan($quid))
)

-- Do binning with width_bucket/etc. here
select id, timestamp, measurement, event_number
from label_events;

我更喜欢在这里使用变量,因为我可以在开发/调试期间多次重新运行第二个查询,而不必重新运行第一个查询。

还需要注意的是,Snowflake中缓存的地理对象会转换为GEOJSON,因此在使用result_scan检索这些对象时,必须将其类型转换回地理类型。

这篇关于在Snowflake的公用表表达式中使用&amp;Match_Recognition&amp;的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆