Vertica - 是否有横向视图功能? [英] Vertica - Is there LATERAL VIEW functionality?

查看:260
本文介绍了Vertica - 是否有横向视图功能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

需要旋转一个矩阵来做TIMESERIES插值/间隙填充,并且想要避免杂乱&效率低下的UNION ALL方法。在Vertica中是否有类似Hive的LATERAL VIEW EXPLODE功能?

编辑:
@marcothesane - 感谢您的有趣场景 - 我喜欢你的方法插值。我会更多地玩弄它,看看它是如何发展的。看起来很有希望。



仅供参考 - 这里是我想出的解决方案 - 我的方案是我试图查看内存使用情况随着时间的推移查询(和用户/资源池等,基本上试图获得成本指标)。我需要进行插值,以便我可以随时查看总使用情况。所以这里是我的查询,时间序列按秒切片,然后聚合,以分钟为单位给出Megabyte_Seconds的度量。

  with qry_cte as 

select
session_id
,request_id
,date_trunc('second',start_timestamp)as dat_str
,timestampadd('ss'
,ceiling(request_duration_ms / 1000):: int
,date_trunc('second',start_timestamp)
)as dat_end
,ceiling(request_duration_ms / 1000):: int as secs
,memory_acquired_mb
from query_requests
其中request_type ='QUERY'
和request_duration_ms> 0
和memory_acquired_mb> 0


select date_trunc('minute',slice_time)as dat_minute
,count(distinct session_id || request_id :: varchar)as query
,sum(memory_acquired_mb)as mb_seconds $ b $ from(
从(
)中选择session_id,request_id,slice_time,ts_first_value(memory_acquired_mb)作为memory_acquired_mb
select session_id,request_ id,dat_str as dat,memory_acquired_mb来自qry_cte
union all
选择session_id,request_id,dat_end作为dat,memory_acquired_mb来自qry_cte
)x
时间片slice_time为'1秒'分区by session_id,request_id order by dat)
)x
group by 1 order by 1 desc
;


解决方案

我实际上有一个方便的方案可以满足您的要求:



其中:

  id | day_strt | sales_01 | sales_02 | sales_03 | sales_04 | sales_05 | sales_06 
1 | 2016-01-19 08:00:00 | 1,842.25 | 5,449.40 | - | 39,776.86 | - | 9,424.10
2 | 2016-01-19 08:00:00 | 73,810.66 | - | 9,867.70 | - | 76,723.91 | 95,605.14

设置:

  id | day_strt | sales_01 | sales_02 | sales_03 | sales_04 | sales_05 | sales_06 
1 | 2016-01-19 08:00:00 | 1,842.25 | 5,449.40 | 22,613.13 | 39,776.86 | 24,600.48 | 9,424.10
2 | 2016-01-19 08:00:00 | 73,810.66 | 41,839.18 | 9到867.70 | 43995.81 | 76,723.91 | 95,605.14



01到06指的是当天的第n个小时销售额从08:00开始记录。


以下是整个业务情景,包括初始输入数据。


  1. 输入数据作为SELECT .. UNION ALL SELECT ...。

  2. 包含6个整数到CROSS JOIN的表格。
  3. 垂直轴:将输入与6个整数交叉连接,根据索引,输出CASE表达式中的第n个销售列。最后,过滤出所有CASE表达式计算结果为NULL的地方。

  4. 使用TIMESERIES子句和线性插值填充空位:销售数据和索引列。 $ b 水平在最终查询中再次转换所有内容。

我可以向你保证。



这里是:

  WITH 
- 输入
输入(id,day_strt,sales_01,sales_02,sales_03,sales_04,sales_05,sales_06)AS(
SELECT 1,'2016-01-19 08:00:00': :TIMESTAMP(0),1842.25,5449.40,NULL :: INT,39776.86,NULL :: INT,9424.10
UNION ALL SELECT 2,'2016-01-19 08:00:00':: TIMESTAMP(0) ,73810.66,NULL :: INT,9867.70,NULL :: INT,76723.91,95605.14

- debug
- SELECT * FROM input;

- 垂直摆动6个月 - > 6个整数
six_idxs(idx)AS(
SELECT 1
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4
UNION ALL SELECT 5
UNION ALL SELECT 6


- 垂直旋转输入并删除带空值的行
- (可能直接在这里添加TIMESERIES子句,$
vert_pivot AS(
SELECT
id
,idx
,TIMESTAMPADD(HOUR,idx-1,day_strt):b $ b - 但可读性和可维护性较差) :TIMESTAMP(0)AS sales_ts
,CASE idx
WHEN 1 THEN sales_01
WHEN 2 THEN sales_02
WHEN 3 THEN sales_03
WHEN 4 THEN sales_04
WHEN 5 THEN sales_05
WHEN 6 THEN sales_06
END AS sales
FROM input
CROSS JOIN six_idxs
WHERE(
CASE idx
when 1 THEN sales_01
当2当时sales_02
当3当时sales_03
当4时sales_04
当5当时sales_05
当6当时sales_06
END
)不为空

- debug:
- SELECT * FROM vert_pivot;

- 缺口填充和内插
gaps_filled AS(
SELECT
id
,TS_FIRST_VALUE(idx,'LINEAR')AS idx
,tm_sales_ts :: TIMESTAMP(0)AS sales_ts
,TS_FIRST_VALUE(sales,'LINEAR')AS sales
FROM vert_pivot
TIMESERIES tm_sales_ts AS'1 HOUR'OVER(
PARTITION BY ID ORDER BY sales_ts


- debug
- SELECT * FROM gaps_filled ORDER BY 1,2;
- 水平转动;最终查询
SELECT

,MIN(sales_ts)AS day_strt
,SUM(CASE idx WHEN 1 THEN sales END):: NUMERIC(7,2)AS sales_01
,SUM(CASE idx WHEN 2 THEN sales END):: NUMERIC(7,2)AS sales_02
,SUM(CASE idx WHEN 3 THEN sales END):: NUMERIC(7,2)AS sales_03
,SUM(CASE idx WHEN 4 then sales END):: NUMERIC(7,2)AS sales_04
,SUM(CASE idx WHEN 5 THEN END END):: NUMERIC(7,2)AS sales_05
,SUM(CASE idx WHEN 6 then sales END):: NUMERIC(7,2)AS sales_06
FROM gaps_filled
GROUP BY id
ORDER BY id
;

开心玩 -



Marco the Sane

Need to rotate a matrix to do TIMESERIES interpolation / gap filling, and would like to avoid the messy & inefficient UNION ALL approach. Is there anything like Hive's LATERAL VIEW EXPLODE functionality available in Vertica?

EDIT: @marcothesane -- thanks for your interesting scenario -- I like your approach for interpolation. I will play around with it more and see how it goes. Looks promising.

FYI -- here is the solution that I came up with -- My scenario is that I am trying to view memory usage over time by query (and user / resource pool, etc. basically trying to get a cost metric). I need to do interpolation so that I can see the total usage at any point in time. So here is my query which does timeseries slicing by second, then aggregates to give a metric of "Megabyte_Seconds" by minute.

with qry_cte as
(
select 
session_id
, request_id
, date_trunc('second',start_timestamp) as dat_str
, timestampadd('ss'
    , ceiling(request_duration_ms/1000)::int
    , date_trunc('second',start_timestamp)
    ) as dat_end
, ceiling(request_duration_ms/1000)::int as secs
, memory_acquired_mb
from query_requests
where request_type = 'QUERY'
and request_duration_ms > 0
and memory_acquired_mb > 0
)

select date_trunc('minute',slice_time) as dat_minute
, count(distinct session_id ||  request_id::varchar) as queries
, sum(memory_acquired_mb) as mb_seconds
from (
select session_id, request_id, slice_time, ts_first_value(memory_acquired_mb) as memory_acquired_mb
from (
select session_id, request_id, dat_str as dat, memory_acquired_mb from qry_cte
union all
select session_id, request_id, dat_end as dat, memory_acquired_mb from qry_cte
) x
timeseries slice_time as '1 second' over (partition by session_id, request_id order by dat)
) x
group by 1 order by 1 desc
;

解决方案

I actually have a scenario handy that could match your requirements:

Out of this:

id|day_strt           |sales_01 |sales_02 |sales_03 |sales_04 |sales_05 |sales_06
 1|2016-01-19 08:00:00| 1,842.25| 5,449.40|-        |39,776.86|-        | 9,424.10
 2|2016-01-19 08:00:00|73,810.66|-        | 9,867.70|-        |76,723.91|95,605.14

Make this:

id|day_strt           |sales_01 |sales_02 |sales_03 |sales_04 |sales_05 |sales_06
 1|2016-01-19 08:00:00| 1,842.25| 5,449.40|22,613.13|39,776.86|24,600.48| 9,424.10
 2|2016-01-19 08:00:00|73,810.66|41,839.18| 9,867.70|43,295.81|76,723.91|95,605.14

01 through 06 refers to the n-th hour of the day when sales were recorded, starting from 08:00.

Below is the whole scenario, including the initial input data.

  1. the input data as a SELECT .. UNION ALL SELECT ... .
  2. A table consisting of 6 integers to CROSS JOIN to the table of 1.
  3. The vertical pivot: Cross join the input with the 6 integers, and depending on the index, output just the n-th sales column in a CASE expression. Finally, filter out wherever the same CASE expression evaluates to NULL.
  4. Fill the gaps using the TIMESERIES clause and linear interpolation: The sales figures and also the indexing column.
  5. Horizontal pivot everything again in the final query.

More performant than a UNION ALL over all columns of the table, I can guarantee you that.

Here goes:

WITH
-- input 
input(id,day_strt,sales_01,sales_02,sales_03,sales_04,sales_05,sales_06) AS (
          SELECT 1,'2016-01-19 08:00:00'::TIMESTAMP(0), 1842.25, 5449.40 ,NULL::INT,39776.86 ,NULL::INT, 9424.10
UNION ALL SELECT 2,'2016-01-19 08:00:00'::TIMESTAMP(0),73810.66 ,NULL::INT, 9867.70 ,NULL::INT,76723.91 ,95605.14
)
-- debug
-- SELECT * FROM input;
,
-- 6 months to pivot vertically -> 6 integers
six_idxs(idx) AS (
          SELECT 1
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4
UNION ALL SELECT 5
UNION ALL SELECT 6
)
,
-- pivot input vertically and remove rows with null measures
-- (could probably add the TIMESERIES clause here directly,
-- but less readable and maintainable)
vert_pivot AS (
SELECT
  id
, idx 
, TIMESTAMPADD(HOUR,idx-1,day_strt)::TIMESTAMP(0) AS sales_ts
, CASE idx
    WHEN 1 THEN  sales_01
    WHEN 2 THEN  sales_02
    WHEN 3 THEN  sales_03
    WHEN 4 THEN  sales_04
    WHEN 5 THEN  sales_05
    WHEN 6 THEN  sales_06
  END AS sales
FROM input
CROSS JOIN six_idxs
WHERE (
    CASE idx
      WHEN 1 THEN  sales_01
      WHEN 2 THEN  sales_02
      WHEN 3 THEN  sales_03
      WHEN 4 THEN  sales_04
      WHEN 5 THEN  sales_05
      WHEN 6 THEN  sales_06
    END
  ) IS NOT NULL
)
-- debug:
-- SELECT * FROM vert_pivot;
,
-- gap filling and interpolation
gaps_filled AS (
SELECT
  id
, TS_FIRST_VALUE(idx,'LINEAR')   AS idx
, tm_sales_ts::TIMESTAMP(0) AS sales_ts
, TS_FIRST_VALUE(sales,'LINEAR') AS sales
FROM vert_pivot
TIMESERIES tm_sales_ts AS '1 HOUR' OVER(
  PARTITION BY id ORDER BY sales_ts
  )
)
-- debug
-- SELECT * FROM gaps_filled ORDER BY 1,2;
-- pivot horizontally; final query
SELECT
  id
, MIN(sales_ts) AS day_strt
, SUM(CASE idx WHEN 1 THEN sales END)::NUMERIC(7,2) AS sales_01
, SUM(CASE idx WHEN 2 THEN sales END)::NUMERIC(7,2) AS sales_02
, SUM(CASE idx WHEN 3 THEN sales END)::NUMERIC(7,2) AS sales_03
, SUM(CASE idx WHEN 4 THEN sales END)::NUMERIC(7,2) AS sales_04
, SUM(CASE idx WHEN 5 THEN sales END)::NUMERIC(7,2) AS sales_05
, SUM(CASE idx WHEN 6 THEN sales END)::NUMERIC(7,2) AS sales_06
FROM gaps_filled
GROUP BY id
ORDER BY id
;

happy playing -

Marco the Sane

这篇关于Vertica - 是否有横向视图功能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆