如何评估HIVE中的CTE(通用表格表达) [英] HOW CTE (Common Table Expression) in HIVE gets evaluated

查看:1017
本文介绍了如何评估HIVE中的CTE(通用表格表达)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



我打算通过定义一个基本投影然后定义多个CTE来重用代码。我的问题是围绕性能和CTE在运行时进行评估的方式。

这是否会导致任何性能问题。更具体地说,是否每次都会评估基础投影。



例如:

  WITH CTE_PERSON as(
SELECT * FROM PersonTable
),


CTE_PERSON_WITH_AGE as(
SELECT * FROM CTE_PERSON年龄> 24
),

CTE_PERSON_WITH_AGE_AND_GENDER as(
SELECT * FROM CTE_PERSON_WITH_AGE WHERE gender ='m'
),

CTE_PERSON_WITH_NAME as(
SELECT * FROM CTE_PERSON WHERE name ='abc'




  • 每次来自PersonTable的所有条目都会被加载
    内存,然后过滤器将应用于
    (或)
  • $之后b $ b
  • 只有
    筛选器的结果集将被加载到内存中。

解决方案

单次扫描。

注意:

- 单个阶段

- 单个 TableScan >






  create table t(i int,j int,k int); 






 解释
with t1 as(select i,j,k from t where i = 1)
,t2 as(从t1选择i,j,k,其中j = 2)
,t3 as从t2中选择i,j,k,其中k = 3)

select * from t3
;






 解释
舞台依赖:
舞台-0是根舞台

舞台计划:
舞台:舞台-0
提取操作符
限制: -1
处理器树:
TableScan
别名:t
统计数据:数据行数:1数据大小:0基本统计数据:PARTIAL列统计数据:无
筛选运算符
谓词:(((i = 1)和(j = 2))和(k = 3))(类型:布尔值)
统计:数据行数:1数据大小:0基本统计数据:PARTIAL Column stats:NONE
选择运算符
表达式:1(类型:int),2(类型:int),3(类型:int)
outputColumnNames:_col0,_col1,_col2
统计:数据行数:1数据大小:0基本统计数据:PARTIAL列统计数据:无
ListSink


My question is around performance and the way a CTE gets evaluated in runtime.

I am planning to reuse code by defining a base projection and then defining multiple CTE's on top of this base projection with different filters.

Does that cause any performance issues.More specifically, does base projection will be evaluated every time.

For example:

WITH CTE_PERSON as (
   SELECT * FROM PersonTable
),


CTE_PERSON_WITH_AGE as (
   SELECT * FROM CTE_PERSON WHERE age > 24 
),

CTE_PERSON_WITH_AGE_AND_GENDER as (
  SELECT * FROM CTE_PERSON_WITH_AGE WHERE gender = 'm'
),

CTE_PERSON_WITH_NAME as (
  SELECT * FROM CTE_PERSON WHERE name = 'abc'
)

  • Does every time all the entries from PersonTable will get loaded into memory and then filters will be applied after (or)
  • Only Result set after filters will be loaded into memory.

解决方案

A single scan.

Note:
- a single stage
- a single TableScan
- predicate: (((i = 1) and (j = 2)) and (k = 3)) (type: boolean)


create table t (i int,j int,k int);


explain 
with    t1 as (select i,j,k from t  where i=1)
       ,t2 as (select i,j,k from t1 where j=2)
       ,t3 as (select i,j,k from t2 where k=3) 

select * from t3
;


Explain
STAGE DEPENDENCIES:
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        TableScan
          alias: t
          Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
          Filter Operator
            predicate: (((i = 1) and (j = 2)) and (k = 3)) (type: boolean)
            Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
            Select Operator
              expressions: 1 (type: int), 2 (type: int), 3 (type: int)
              outputColumnNames: _col0, _col1, _col2
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
              ListSink

这篇关于如何评估HIVE中的CTE(通用表格表达)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆