Hive分区修剪计算列 [英] Hive partition pruning on computed column

查看:179
本文介绍了Hive分区修剪计算列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Hive上有几张表,我的查询正在尝试检索过去x天的数据。

 选择Hive在使用直接日期时修剪分区,但在使用公式时进行全表扫描。 * 
from f_event
where date_key> 20160101;

扫描分区..

s3://...key=20160102 [f]
s3://...key=20160103 [f]
s3://...key=20160104 [f]

如果我使用公式,例如,获取过去4周的数据

 选择次数(*)
从f_event f
where date_key> from_unixtime(unix_timestamp() - 2 * 7 * 60 * 60 * 24,'yyyyMMdd')

这是扫描表中的所有分区。



环境:Hadoop 2.6.0,EMR,S3上的Hive,Hive 1.0.0

解决方案当过滤表达式包含非确定性函数(如 unix_timestamp()<>)时,Hive不会触发分区修剪。 / code>。



一个很好的理由在讨论


想象一下, :



WHERE partition_column = f(unix_timestamp())AND ordinary_column =
f(unix_timestamp)
/ p>

谓词的右侧必须在map-time,
被评估,而你假设左手侧应该被评估为
编译时间,这意味着y你有两个不同的
unix_timestamp()浮动值,它只能很糟糕地结束。



I have a few tables on Hive and my query is trying to retrieve the data for the past x days. Hive is pruning the partitions when I use a direct date, but is doing a full table scan when using a formula instead.

select *
from   f_event
where  date_key > 20160101;

scanned partitions..

s3://...key=20160102 [f]
s3://...key=20160103 [f]
s3://...key=20160104 [f]

If I use a formula, say, to get the past 4 weeks of data

Select count(*)
From    f_event f
Where  date_key  > from_unixtime(unix_timestamp()-2*7*60*60*24, 'yyyyMMdd')

This is scanning all partitions in the table.

environment : Hadoop 2.6.0, EMR, Hive on S3, Hive 1.0.0

解决方案

Hive doesn't trigger partition pruning when the filtering expression contains non-deterministic functions such as unix_timestamp().

A good reason for this was mentioned in the discussion:

Imagine a situation where you had:

WHERE partition_column = f(unix_timestamp()) AND ordinary_column = f(unix_timestamp).

The right hand side of the predicate has to be evaluated at map-time, whereas you're assuming that left hand side should be evaluated at compile time, which means you have two different values of unix_timestamp() floating around, which can only end badly.

这篇关于Hive分区修剪计算列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆