Spark SQL 窗口超过两个指定时间边界之间的间隔 - 3 小时到 2 小时前 [英] Spark SQL Window over interval of between two specified time boundaries - between 3 hours and 2 hours ago

查看：22 发布时间：2021/11/14 22:40:49 apache-spark apache-spark-sql window-functions

本文介绍了Spark SQL 窗口超过两个指定时间边界之间的间隔 - 3 小时到 2 小时前的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在 Spark SQL 中使用两个预定义边界指定窗口间隔的正确方法是什么?

What is the proper way of specifying window interval in Spark SQL, using two predefined boundaries?

我正在尝试在3 小时前到 2 小时前"的窗口内汇总表中的值.

I am trying to sum up values from my table over a window of "between 3 hours ago and 2 hours ago".

当我运行此查询时:

select *, sum(value) over (
partition by a, b
order by cast(time_value as timestamp)
range between interval 2 hours preceding and current row
) as sum_value
from my_temp_table;

那行得通.我得到了我期望的结果，即落入 2 小时滚动窗口的值的总和.

That works. I get results that I expect, i.e. sums of values that fall into 2 hours rolling window.

现在，我需要的是让滚动窗口不绑定到当前行，而是考虑 3 小时前和 2 小时前之间的行.我试过:

Now, what I need is to have that rolling window not being bound to the current row but to take into account rows between 3 hours ago and 2 hours ago. I tried with:

select *, sum(value) over (
partition by a, b
order by cast(time_value as timestamp)
range between interval 3 hours preceding and 2 hours preceding
) as sum_value
from my_temp_table;

但是我得到 extraeous input 'hours' expecting {'PRECEDING', 'FOLLOWING'} 错误.

我也试过:

select *, sum(value) over (
partition by a, b
order by cast(time_value as timestamp)
range between interval 3 hours preceding and interval 2 hours preceding
) as sum_value
from my_temp_table;

但后来我得到了不同的错误 scala.MatchError: CalendarIntervalType (of class org.apache.spark.sql.types.CalendarIntervalType$)

but then I get different error scala.MatchError: CalendarIntervalType (of class org.apache.spark.sql.types.CalendarIntervalType$)

我尝试的第三个选项是:

Third option I tried is:

select *, sum(value) over (
partition by a, b
order by cast(time_value as timestamp)
range between interval 3 hours preceding and 2 preceding
) as sum_value
from my_temp_table;

它没有像我们预期的那样工作:由于数据类型不匹配，无法解析 'RANGE BETWEEN interval 3 hours PRECEDING AND 2 PRECEDING'

and it doesn't work as we would expect: cannot resolve 'RANGE BETWEEN interval 3 hours PRECEDING AND 2 PRECEDING' due to data type mismatch

我很难找到间隔类型的文档，如此链接说得还不够多，其他信息还不够成熟.至少我发现了什么.

I am having difficulties finding the docs for interval type as this link doesn't say enough and other information is kinda half baked. At least what I found.

Spark SQL 窗口超过两个指定时间边界之间的间隔 - 3 小时到 2 小时前 [英] Spark SQL Window over interval of between two specified time boundaries - between 3 hours and 2 hours ago

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Spark SQL 窗口超过两个指定时间边界之间的间隔 - 3 小时到 2 小时前 [英] Spark SQL Window over interval of between two specified time boundaries - between 3 hours and 2 hours ago

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭