Hive分区修剪计算列 [英] Hive partition pruning on computed column

查看：179 发布时间：2018/5/31 19:58:03 hadoop amazon-s3 hive emr

本文介绍了Hive分区修剪计算列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在Hive上有几张表，我的查询正在尝试检索过去x天的数据。

 选择Hive在使用直接日期时修剪分区，但在使用公式时进行全表扫描。 * 
 from f_event 
 where date_key> 20160101; 
 
扫描分区.. 
 
 s3：//...key=20160102 [f] 
 s3：//...key=20160103 [f] 
 s3：//...key=20160104 [f]

如果我使用公式，例如，获取过去4周的数据

 选择次数（*）
从f_event f 
 where date_key> from_unixtime（unix_timestamp（） -  2 * 7 * 60 * 60 * 24，'yyyyMMdd'）

这是扫描表中的所有分区。

环境：Hadoop 2.6.0，EMR，S3上的Hive，Hive 1.0.0

解决方案当过滤表达式包含非确定性函数（如

 unix_timestamp（）<>）时，Hive不会触发分区修剪。 / code>。
 
 
 一个很好的理由在讨论：
 
 
 
 想象一下， ：
 
 
   WHERE partition_column = f（unix_timestamp（））AND ordinary_column = 
f（unix_timestamp） / p> 
 
 
谓词的右侧必须在map-time，
被评估，而你假设左手侧应该被评估为
编译时间，这意味着y你有两个不同的
 unix_timestamp（）浮动值，它只能很糟糕地结束。
 
 
 
I have a few tables on Hive and my query is trying to retrieve the data for the past x days. Hive is pruning the partitions when I use a direct date, but is doing a full table scan when using a formula instead.
select *
from   f_event
where  date_key > 20160101;

scanned partitions..

s3://...key=20160102 [f]
s3://...key=20160103 [f]
s3://...key=20160104 [f]
If I use a formula, say, to get the past 4 weeks of data
Select count(*)
From    f_event f
Where  date_key  > from_unixtime(unix_timestamp()-2*7*60*60*24, 'yyyyMMdd')
This is scanning all partitions in the table.

environment : Hadoop 2.6.0, EMR, Hive on S3, Hive 1.0.0
 解决方案 
Hive doesn't trigger partition pruning when the filtering expression contains non-deterministic functions such as unix_timestamp().

A good reason for this was mentioned in the discussion:

  Imagine a situation where you had:
  
  WHERE partition_column = f(unix_timestamp()) AND ordinary_column =
  f(unix_timestamp).
  
  The right hand side of the predicate has to be evaluated at map-time,
  whereas you're assuming that left hand side should be evaluated at
  compile time, which means you have two different values of
  unix_timestamp() floating around, which can only end badly.


                        
这篇关于Hive分区修剪计算列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

Hive分区修剪计算列 [英] Hive partition pruning on computed column

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

Hive分区修剪计算列 [英] Hive partition pruning on computed column

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

登录关闭