Vertica - 是否有横向视图功能？ [英] Vertica - Is there LATERAL VIEW functionality?

查看：260 发布时间：2018/6/12 14:14:26 hive vertica

本文介绍了Vertica - 是否有横向视图功能？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

需要旋转一个矩阵来做TIMESERIES插值/间隙填充，并且想要避免杂乱&效率低下的UNION ALL方法。在Vertica中是否有类似Hive的LATERAL VIEW EXPLODE功能？

编辑：
@marcothesane - 感谢您的有趣场景 - 我喜欢你的方法插值。我会更多地玩弄它，看看它是如何发展的。看起来很有希望。

仅供参考 - 这里是我想出的解决方案 - 我的方案是我试图查看内存使用情况随着时间的推移查询（和用户/资源池等，基本上试图获得成本指标）。我需要进行插值，以便我可以随时查看总使用情况。所以这里是我的查询，时间序列按秒切片，然后聚合，以分钟为单位给出Megabyte_Seconds的度量。
with qry_cte as （ select session_id ，request_id ，date_trunc（'second'，start_timestamp）as dat_str ，timestampadd（'ss' ，ceiling（request_duration_ms / 1000）:: int ，date_trunc（'second'，start_timestamp））as dat_end ，ceiling（request_duration_ms / 1000）:: int as secs ，memory_acquired_mb from query_requests 其中request_type ='QUERY' 和request_duration_ms> 0 和memory_acquired_mb> 0 ） select date_trunc（'minute'，slice_time）as dat_minute ，count（distinct session_id || request_id :: varchar）as query ，sum（memory_acquired_mb）as mb_seconds $ b $ from（从（）中选择session_id，request_id，slice_time，ts_first_value（memory_acquired_mb）作为memory_acquired_mb select session_id，request_ id，dat_str as dat，memory_acquired_mb来自qry_cte union all 选择session_id，request_id，dat_end作为dat，memory_acquired_mb来自qry_cte ）x 时间片slice_time为'1秒'分区by session_id，request_id order by dat））x group by 1 order by 1 desc ;

解决方案
我实际上有一个方便的方案可以满足您的要求：

其中：

id | day_strt | sales_01 | sales_02 | sales_03 | sales_04 | sales_05 | sales_06 1 | 2016-01-19 08：00：00 | 1,842.25 | 5,449.40 | - | 39,776.86 | - | 9,424.10 2 | 2016-01-19 08：00：00 | 73,810.66 | - | 9,867.70 | - | 76,723.91 | 95,605.14
设置：

id | day_strt | sales_01 | sales_02 | sales_03 | sales_04 | sales_05 | sales_06 1 | 2016-01-19 08：00：00 | 1,842.25 | 5,449.40 | 22,613.13 | 39,776.86 | 24,600.48 | 9,424.10 2 | 2016-01-19 08：00：00 | 73,810.66 | 41,839.18 | 9到867.70 | 43995.81 | 76,723.91 | 95,605.14

01到06指的是当天的第n个小时销售额从08:00开始记录。

以下是整个业务情景，包括初始输入数据。

输入数据作为SELECT .. UNION ALL SELECT ...。

包含6个整数到CROSS JOIN的表格。

垂直轴：将输入与6个整数交叉连接，根据索引，输出CASE表达式中的第n个销售列。最后，过滤出所有CASE表达式计算结果为NULL的地方。

使用TIMESERIES子句和线性插值填充空位：销售数据和索引列。 $ b 水平在最终查询中再次转换所有内容。

我可以向你保证。

这里是：

WITH - 输入输入（id，day_strt，sales_01，sales_02，sales_03，sales_04，sales_05，sales_06）AS（ SELECT 1，'2016-01-19 08:00:00'：：TIMESTAMP（0），1842.25,5449.40，NULL :: INT，39776.86，NULL :: INT，9424.10 UNION ALL SELECT 2，'2016-01-19 08:00:00':: TIMESTAMP（0），73810.66，NULL :: INT，9867.70，NULL :: INT，76723.91,95605.14 ） - debug - SELECT * FROM input; ， - 垂直摆动6个月 - > 6个整数 six_idxs（idx）AS（ SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 ）， - 垂直旋转输入并删除带空值的行 - （可能直接在这里添加TIMESERIES子句，$ vert_pivot AS（ SELECT id ，idx ，TIMESTAMPADD（HOUR，idx-1，day_strt）：b $ b - 但可读性和可维护性较差）：TIMESTAMP（0）AS sales_ts ，CASE idx WHEN 1 THEN sales_01 WHEN 2 THEN sales_02 WHEN 3 THEN sales_03 WHEN 4 THEN sales_04 WHEN 5 THEN sales_05 WHEN 6 THEN sales_06 END AS sales FROM input CROSS JOIN six_idxs WHERE（ CASE idx when 1 THEN sales_01 当2当时sales_02 当3当时sales_03 当4时sales_04 当5当时sales_05 当6当时sales_06 END ）不为空） - debug： - SELECT * FROM vert_pivot; ， - 缺口填充和内插 gaps_filled AS（ SELECT id ，TS_FIRST_VALUE（idx，'LINEAR'）AS idx ，tm_sales_ts :: TIMESTAMP（0）AS sales_ts ，TS_FIRST_VALUE（sales，'LINEAR'）AS sales FROM vert_pivot TIMESERIES tm_sales_ts AS'1 HOUR'OVER（ PARTITION BY ID ORDER BY sales_ts ）） - debug - SELECT * FROM gaps_filled ORDER BY 1,2; - 水平转动;最终查询 SELECT ，MIN（sales_ts）AS day_strt ，SUM（CASE idx WHEN 1 THEN sales END）:: NUMERIC（7,2）AS sales_01 ，SUM（CASE idx WHEN 2 THEN sales END）:: NUMERIC（7,2）AS sales_02 ，SUM（CASE idx WHEN 3 THEN sales END）:: NUMERIC（7,2）AS sales_03 ，SUM（CASE idx WHEN 4 then sales END）:: NUMERIC（7,2）AS sales_04 ，SUM（CASE idx WHEN 5 THEN END END）:: NUMERIC（7,2）AS sales_05 ，SUM（CASE idx WHEN 6 then sales END）:: NUMERIC（7,2）AS sales_06 FROM gaps_filled GROUP BY id ORDER BY id ;
开心玩 -

Marco the Sane
Need to rotate a matrix to do TIMESERIES interpolation / gap filling, and would like to avoid the messy & inefficient UNION ALL approach. Is there anything like Hive's LATERAL VIEW EXPLODE functionality available in Vertica?

EDIT: @marcothesane -- thanks for your interesting scenario -- I like your approach for interpolation. I will play around with it more and see how it goes. Looks promising.

FYI -- here is the solution that I came up with -- My scenario is that I am trying to view memory usage over time by query (and user / resource pool, etc. basically trying to get a cost metric). I need to do interpolation so that I can see the total usage at any point in time. So here is my query which does timeseries slicing by second, then aggregates to give a metric of "Megabyte_Seconds" by minute.
with qry_cte as ( select session_id , request_id , date_trunc('second',start_timestamp) as dat_str , timestampadd('ss' , ceiling(request_duration_ms/1000)::int , date_trunc('second',start_timestamp) ) as dat_end , ceiling(request_duration_ms/1000)::int as secs , memory_acquired_mb from query_requests where request_type = 'QUERY' and request_duration_ms > 0 and memory_acquired_mb > 0 ) select date_trunc('minute',slice_time) as dat_minute , count(distinct session_id || request_id::varchar) as queries , sum(memory_acquired_mb) as mb_seconds from ( select session_id, request_id, slice_time, ts_first_value(memory_acquired_mb) as memory_acquired_mb from ( select session_id, request_id, dat_str as dat, memory_acquired_mb from qry_cte union all select session_id, request_id, dat_end as dat, memory_acquired_mb from qry_cte ) x timeseries slice_time as '1 second' over (partition by session_id, request_id order by dat) ) x group by 1 order by 1 desc ;

解决方案
I actually have a scenario handy that could match your requirements:

Out of this:
id|day_strt |sales_01 |sales_02 |sales_03 |sales_04 |sales_05 |sales_06 1|2016-01-19 08:00:00| 1,842.25| 5,449.40|- |39,776.86|- | 9,424.10 2|2016-01-19 08:00:00|73,810.66|- | 9,867.70|- |76,723.91|95,605.14
Make this:
id|day_strt |sales_01 |sales_02 |sales_03 |sales_04 |sales_05 |sales_06 1|2016-01-19 08:00:00| 1,842.25| 5,449.40|22,613.13|39,776.86|24,600.48| 9,424.10 2|2016-01-19 08:00:00|73,810.66|41,839.18| 9,867.70|43,295.81|76,723.91|95,605.14
01 through 06 refers to the n-th hour of the day when sales were recorded, starting from 08:00.

Below is the whole scenario, including the initial input data.

the input data as a SELECT .. UNION ALL SELECT ... .

A table consisting of 6 integers to CROSS JOIN to the table of 1.

The vertical pivot: Cross join the input with the 6 integers, and depending on the index, output just the n-th sales column in a CASE expression. Finally, filter out wherever the same CASE expression evaluates to NULL.

Fill the gaps using the TIMESERIES clause and linear interpolation: The sales figures and also the indexing column.

Horizontal pivot everything again in the final query.

More performant than a UNION ALL over all columns of the table, I can guarantee you that.

Here goes:
WITH -- input input(id,day_strt,sales_01,sales_02,sales_03,sales_04,sales_05,sales_06) AS ( SELECT 1,'2016-01-19 08:00:00'::TIMESTAMP(0), 1842.25, 5449.40 ,NULL::INT,39776.86 ,NULL::INT, 9424.10 UNION ALL SELECT 2,'2016-01-19 08:00:00'::TIMESTAMP(0),73810.66 ,NULL::INT, 9867.70 ,NULL::INT,76723.91 ,95605.14 ) -- debug -- SELECT * FROM input; , -- 6 months to pivot vertically -> 6 integers six_idxs(idx) AS ( SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 ) , -- pivot input vertically and remove rows with null measures -- (could probably add the TIMESERIES clause here directly, -- but less readable and maintainable) vert_pivot AS ( SELECT id , idx , TIMESTAMPADD(HOUR,idx-1,day_strt)::TIMESTAMP(0) AS sales_ts , CASE idx WHEN 1 THEN sales_01 WHEN 2 THEN sales_02 WHEN 3 THEN sales_03 WHEN 4 THEN sales_04 WHEN 5 THEN sales_05 WHEN 6 THEN sales_06 END AS sales FROM input CROSS JOIN six_idxs WHERE ( CASE idx WHEN 1 THEN sales_01 WHEN 2 THEN sales_02 WHEN 3 THEN sales_03 WHEN 4 THEN sales_04 WHEN 5 THEN sales_05 WHEN 6 THEN sales_06 END ) IS NOT NULL ) -- debug: -- SELECT * FROM vert_pivot; , -- gap filling and interpolation gaps_filled AS ( SELECT id , TS_FIRST_VALUE(idx,'LINEAR') AS idx , tm_sales_ts::TIMESTAMP(0) AS sales_ts , TS_FIRST_VALUE(sales,'LINEAR') AS sales FROM vert_pivot TIMESERIES tm_sales_ts AS '1 HOUR' OVER( PARTITION BY id ORDER BY sales_ts ) ) -- debug -- SELECT * FROM gaps_filled ORDER BY 1,2; -- pivot horizontally; final query SELECT id , MIN(sales_ts) AS day_strt , SUM(CASE idx WHEN 1 THEN sales END)::NUMERIC(7,2) AS sales_01 , SUM(CASE idx WHEN 2 THEN sales END)::NUMERIC(7,2) AS sales_02 , SUM(CASE idx WHEN 3 THEN sales END)::NUMERIC(7,2) AS sales_03 , SUM(CASE idx WHEN 4 THEN sales END)::NUMERIC(7,2) AS sales_04 , SUM(CASE idx WHEN 5 THEN sales END)::NUMERIC(7,2) AS sales_05 , SUM(CASE idx WHEN 6 THEN sales END)::NUMERIC(7,2) AS sales_06 FROM gaps_filled GROUP BY id ORDER BY id ;
happy playing -

Marco the Sane

这篇关于Vertica - 是否有横向视图功能？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Vertica - 是否有横向视图功能？ [英] Vertica - Is there LATERAL VIEW functionality?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Vertica - 是否有横向视图功能？ [英] Vertica - Is there LATERAL VIEW functionality?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭