为什么此查询不会发生分区消除? [英] Why partitions elimination does not happen for this query?
问题描述
我有一个按年、月、日和小时分区的 hive 表.我需要对其运行查询以获取过去 7 天的数据.这是在 Hive 0.14.0.2.2.4.2-2
中.我的查询目前看起来像这样:
SELECT COUNT(column_name) from table_name其中年 >= 年(date_sub(from_unixtime(unix_timestamp()), 7))AND 月份 >= 月份(date_sub(from_unixtime(unix_timestamp()), 7))AND day >= day(date_sub(from_unixtime(unix_timestamp()), 7));
这需要很长时间.当我用实际数字代替上述内容时,请说:
SELECT COUNT(column_name) from table_name其中年份 >= 2017AND 月份 >= 2AND 天 >= 13
它会在几分钟内完成.有什么方法可以更改上述脚本,使其实际上只包含查询中的数字而不是函数?
我尝试使用 set
像:
set yearLimit = year(date_sub(from_unixtime(unix_timestamp()), 7));从 table_name 选择 COUNT(column_name)where year >= ${hiveconf:yearLimit}AND 月份 >= 月份(date_sub(from_unixtime(unix_timestamp()), 7))AND day >= day(date_sub(from_unixtime(unix_timestamp()), 7));
但这并不能解决问题.
解决方案
选择计数(列名)来自表名其中年 >= 年 (date_sub (current_date,7))和月 >= 月 (date_sub (current_date,7))和天> =天(date_sub(current_date,7));
原始查询出了什么问题?
<块引用>unix_timestamp()
以秒为单位获取当前的 Unix 时间戳.这个功能不是确定性,它的值在查询范围内不是固定的执行,因此阻止了查询的适当优化 - 这自 2.0 起已弃用,取而代之的是 CURRENT_TIMESTAMP 常量.
https://cwiki.apache.org/confluence/display/Hive/语言手册+UDF
(我只是稍微更改了文档:-))
由于 unix_timestamp() 值在执行过程中可能会发生变化,因此应对每一行计算表达式,从而防止分区消除.
为什么使用 SET
不起作用?
set
只不过是一种文本替换机制.set
期间没有计算任何内容.
唯一发生的事情是变量被分配了一个文本.
在执行查询之前,变量占位符 (${hiveconf:...}
) 被替换为分配的 text.
只有这样,查询才会被解析和执行.
hive>设置 a=sele;蜂巢>设置 b=ct 1+;蜂巢>设置 c=1;蜂巢>${hiveconf:a}${hiveconf:b}${hiveconf:c};行2
<小时>
演示
create table table_name (column_name int) partitioned by (year int,month int,day int);设置 hive.exec.dynamic.partition.mode=nonstrict;插入 table_name 分区(年、月、日)选择位置,年(dt),月(dt),天(dt)来自(选择 pe.pos,date_sub (current_date,pe.pos) 作为 dt从(选择 1)x侧视图poseexplode (split (space (99),' ')) pe) t;
<小时>
解释依赖选择计数(列名称)来自表名where year >= year (date_sub (from_unixtime (unix_timestamp ()),7))和月>=月(date_sub(from_unixtime(unix_timestamp()),7))和天>=天(date_sub(from_unixtime(unix_timestamp()),7));
<小时><块引用>
{"input_partitions":[{"partitionName":"default@table_name@year=2016/month=11/day=14"},{"partitionName":"default@table_name@year=2016/month=11/day=15"},{"partitionName":"default@table_name@year=2016/month=11/day=16"},{"partitionName":"default@table_name@year=2016/month=11/day=17"},{"partitionName":"default@table_name@year=2016/month=11/day=18"},{"partitionName":"default@table_name@year=2016/month=11/day=19"},{"partitionName":"default@table_name@year=2016/month=11/day=20"},{"partitionName":"default@table_name@year=2016/month=11/day=21"},{"partitionName":"default@table_name@year=2016/month=11/day=22"},{"partitionName":"default@table_name@year=2016/month=11/day=23"},{"partitionName":"default@table_name@year=2016/month=11/day=24"},{"partitionName":"default@table_name@year=2016/month=11/day=25"},{"partitionName":"default@table_name@year=2016/month=11/day=26"},{"partitionName":"default@table_name@year=2016/month=11/day=27"},{"partitionName":"default@table_name@year=2016/month=11/day=28"},{"partitionName":"default@table_name@year=2016/month=11/day=29"},{"partitionName":"default@table_name@year=2016/month=11/day=30"},{"partitionName":"default@table_name@year=2016/month=12/day=1"},{"partitionName":"default@table_name@year=2016/month=12/day=10"},{"partitionName":"default@table_name@year=2016/month=12/day=11"},{"partitionName":"default@table_name@year=2016/month=12/day=12"},{"partitionName":"default@table_name@year=2016/month=12/day=13"},{"partitionName":"default@table_name@year=2016/month=12/day=14"},{"partitionName":"default@table_name@year=2016/month=12/day=15"},{"partitionName":"default@table_name@year=2016/month=12/day=16"},{"partitionName":"default@table_name@year=2016/month=12/day=17"},{"partitionName":"default@table_name@year=2016/month=12/day=18"},{"partitionName":"default@table_name@year=2016/month=12/day=19"},{"partitionName":"default@table_name@year=2016/month=12/day=2"},{"partitionName":"default@table_name@year=2016/month=12/day=20"},{"partitionName":"default@table_name@year=2016/month=12/day=21"},{"partitionName":"default@table_name@year=2016/month=12/day=22"},{"partitionName":"default@table_name@year=2016/month=12/day=23"},{"partitionName":"default@table_name@year=2016/month=12/day=24"},{"partitionName":"default@table_name@year=2016/month=12/day=25"},{"partitionName":"default@table_name@year=2016/month=12/day=26"},{"partitionName":"default@table_name@year=2016/月=12/天=27"},{"partitionName":"default@table_name@year=2016/month=12/day=28"},{"partitionName":"default@table_name@year=2016/month=12/day=29"},{"partitionName":"default@table_name@year=2016/month=12/day=3"},{"partitionName":"default@table_name@year=2016/month=12/day=30"},{"partitionName":"default@table_name@year=2016/month=12/day=31"},{"partitionName":"default@table_name@year=2016/month=12/day=4"},{"partitionName":"default@table_name@year=2016/month=12/day=5"},{"partitionName":"default@table_name@year=2016/month=12/day=6"},{"partitionName":"default@table_name@year=2016/month=12/day=7"},{"partitionName":"default@table_name@year=2016/month=12/day=8"},{"partitionName":"default@table_name@year=2016/month=12/day=9"},{"partitionName":"default@table_name@year=2017/month=1/day=1"},{"partitionName":"default@table_name@year=2017/month=1/day=10"},{"partitionName":"default@table_name@year=2017/month=1/day=11"},{"partitionName":"default@table_name@year=2017/month=1/day=12"},{"partitionName":"default@table_name@year=2017/month=1/day=13"},{"partitionName":"default@table_name@year=2017/month=1/day=14"},{"partitionName":"default@table_name@year=2017/month=1/day=15"},{"partitionName":"default@table_name@year=2017/month=1/day=16"},{"partitionName":"default@table_name@year=2017/month=1/day=17"},{"partitionName":"default@table_name@year=2017/month=1/day=18"},{"partitionName":"default@table_name@year=2017/month=1/day=19"},{"partitionName":"default@table_name@year=2017/month=1/day=2"},{"partitionName":"default@table_name@year=2017/month=1/day=20"},{"partitionName":"default@table_name@year=2017/month=1/day=21"},{"partitionName":"default@table_name@year=2017/month=1/day=22"},{"partitionName":"default@table_name@year=2017/month=1/day=23"},{"partitionName":"default@table_name@year=2017/month=1/day=24"},{"partitionName":"default@table_name@year=2017/month=1/day=25"},{"partitionName":"default@table_name@year=2017/month=1/day=26"},{"partitionName":"default@table_name@year=2017/月=1/天=27"},{"partitionName":"default@table_name@year=2017/month=1/day=28"},{"partitionName":"default@table_name@year=2017/month=1/day=29"},{"partitionName":"default@table_name@year=2017/month=1/day=3"},{"partitionName":"default@table_name@year=2017/month=1/day=30"},{"partitionName":"default@table_name@year=2017/month=1/day=31"},{"partitionName":"default@table_name@year=2017/month=1/day=4"},{"partitionName":"default@table_name@year=2017/month=1/day=5"},{"partitionName":"default@table_name@year=2017/month=1/day=6"},{"partitionName":"default@table_name@year=2017/month=1/day=7"},{"partitionName":"default@table_name@year=2017/month=1/day=8"},{"partitionName":"default@table_name@year=2017/month=1/day=9"},{"partitionName":"default@table_name@year=2017/month=2/day=1"},{"partitionName":"default@table_name@year=2017/month=2/day=10"},{"partitionName":"default@table_name@year=2017/month=2/day=11"},{"partitionName":"default@table_name@year=2017/month=2/day=12"},{"partitionName":"default@table_name@year=2017/month=2/day=13"},{"partitionName":"default@table_name@year=2017/month=2/day=14"},{"partitionName":"default@table_name@year=2017/month=2/day=15"},{"partitionName":"default@table_name@year=2017/month=2/day=16"},{"partitionName":"default@table_name@year=2017/月=2/天=17"},{"partitionName":"default@table_name@year=2017/month=2/day=18"},{"partitionName":"default@table_name@year=2017/month=2/day=19"},{"partitionName":"default@table_name@year=2017/month=2/day=2"},{"partitionName":"default@table_name@year=2017/month=2/day=20"},{"partitionName":"default@table_name@year=2017/month=2/day=21"},{"partitionName":"default@table_name@year=2017/month=2/day=3"},{"partitionName":"default@table_name@year=2017/month=2/day=4"},{"partitionName":"default@table_name@year=2017/month=2/day=5"},{"partitionName":"default@table_name@year=2017/month=2/day=6"},{"partitionName":"default@table_name@year=2017/month=2/day=7"},{"partitionName":"default@table_name@year=2017/month=2/day=8"},{"partitionName":"default@table_name@year=2017/month=2/day=9"}],"input_tables":[{"tablename":"default@table_name","tabletype":"MANAGED_TABLE"}]}
解释依赖选择计数(列名称)来自表名其中年 >= 年 (date_sub (current_date,7))和月 >= 月 (date_sub (current_date,7))和天> =天(date_sub(current_date,7));
<块引用>
{"input_partitions":[{"partitionName":"default@table_name@year=2017/month=2/day=14"},{"partitionName":"default@table_name@year=2017/month=2/day=15"},{"partitionName":"default@table_name@year=2017/month=2/day=16"},{"partitionName":"default@table_name@year=2017/month=2/day=17"},{"partitionName":"default@table_name@year=2017/month=2/day=18"},{"partitionName":"default@table_name@year=2017/month=2/day=19"},{"partitionName":"default@table_name@year=2017/month=2/day=20"},{"partitionName":"default@table_name@year=2017/month=2/day=21"}],"input_tables":[{"tablename":"default@table_name","tabletype":"MANAGED_TABLE"}]}
I have a hive table which is partitioned by year, month, day and hour. I need to run a query against it to fetch the last 7 days data. This is in Hive 0.14.0.2.2.4.2-2
. My query currently looks like this :
SELECT COUNT(column_name) from table_name
where year >= year(date_sub(from_unixtime(unix_timestamp()), 7))
AND month >= month(date_sub(from_unixtime(unix_timestamp()), 7))
AND day >= day(date_sub(from_unixtime(unix_timestamp()), 7));
This takes a very long time. When I substitute the actual numbers for the above say something like :
SELECT COUNT(column_name) from table_name
where year >= 2017
AND month >= 2
AND day >= 13
it finishes in a few minutes. Is there any way to change the above script so that is actually includes just the numbers in the query instead of the functions?
I tried using set
like:
set yearLimit = year(date_sub(from_unixtime(unix_timestamp()), 7));
SELECT COUNT(column_name) from table_name
where year >= ${hiveconf:yearLimit}
AND month >= month(date_sub(from_unixtime(unix_timestamp()), 7))
AND day >= day(date_sub(from_unixtime(unix_timestamp()), 7));
but this does not solve the issue.
Solution
select count (column_name)
from table_name
where year >= year (date_sub (current_date,7))
and month >= month (date_sub (current_date,7))
and day >= day (date_sub (current_date,7))
;
What went wrong with the original query?
unix_timestamp()
Gets current Unix timestamp in seconds. This function is not deterministic and its value is not fixed for the scope of a query execution, therefore prevents proper optimization of queries - this has been deprecated since 2.0 in favour of CURRENT_TIMESTAMP constant.
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF
(I've just changed the documentation a little bit :-))
Since unix_timestamp() values might change during the execution, the expression should be evaluated for each row, therefore preventing partitions elimination.
Why using SET
did not work?
set
is nothing but a text replacement mechanism.
Nothing is being computed during the set
.
The only thing that happens is that variables are being assigned a text.
Before the query is being executed the variables place holders (${hiveconf:...}
) are being replaced with the assigned text.
Only then the query is being parsed and executed.
hive> set a=sele;
hive> set b=ct 1+;
hive> set c=1;
hive> ${hiveconf:a}${hiveconf:b}${hiveconf:c};
OK
2
Demo
create table table_name (column_name int) partitioned by (year int,month int,day int);
set hive.exec.dynamic.partition.mode=nonstrict;
insert into table_name partition (year,month,day)
select pos
,year(dt)
,month(dt)
,day(dt)
from (select pe.pos
,date_sub (current_date,pe.pos) as dt
from (select 1) x
lateral view posexplode (split (space (99),' ')) pe
) t
;
explain dependency
select count (column_name)
from table_name
where year >= year (date_sub (from_unixtime (unix_timestamp ()),7))
and month >= month (date_sub (from_unixtime (unix_timestamp ()),7))
and day >= day (date_sub (from_unixtime (unix_timestamp ()),7))
;
{"input_partitions":[{"partitionName":"default@table_name@year=2016/month=11/day=14"},{"partitionName":"default@table_name@year=2016/month=11/day=15"},{"partitionName":"default@table_name@year=2016/month=11/day=16"},{"partitionName":"default@table_name@year=2016/month=11/day=17"},{"partitionName":"default@table_name@year=2016/month=11/day=18"},{"partitionName":"default@table_name@year=2016/month=11/day=19"},{"partitionName":"default@table_name@year=2016/month=11/day=20"},{"partitionName":"default@table_name@year=2016/month=11/day=21"},{"partitionName":"default@table_name@year=2016/month=11/day=22"},{"partitionName":"default@table_name@year=2016/month=11/day=23"},{"partitionName":"default@table_name@year=2016/month=11/day=24"},{"partitionName":"default@table_name@year=2016/month=11/day=25"},{"partitionName":"default@table_name@year=2016/month=11/day=26"},{"partitionName":"default@table_name@year=2016/month=11/day=27"},{"partitionName":"default@table_name@year=2016/month=11/day=28"},{"partitionName":"default@table_name@year=2016/month=11/day=29"},{"partitionName":"default@table_name@year=2016/month=11/day=30"},{"partitionName":"default@table_name@year=2016/month=12/day=1"},{"partitionName":"default@table_name@year=2016/month=12/day=10"},{"partitionName":"default@table_name@year=2016/month=12/day=11"},{"partitionName":"default@table_name@year=2016/month=12/day=12"},{"partitionName":"default@table_name@year=2016/month=12/day=13"},{"partitionName":"default@table_name@year=2016/month=12/day=14"},{"partitionName":"default@table_name@year=2016/month=12/day=15"},{"partitionName":"default@table_name@year=2016/month=12/day=16"},{"partitionName":"default@table_name@year=2016/month=12/day=17"},{"partitionName":"default@table_name@year=2016/month=12/day=18"},{"partitionName":"default@table_name@year=2016/month=12/day=19"},{"partitionName":"default@table_name@year=2016/month=12/day=2"},{"partitionName":"default@table_name@year=2016/month=12/day=20"},{"partitionName":"default@table_name@year=2016/month=12/day=21"},{"partitionName":"default@table_name@year=2016/month=12/day=22"},{"partitionName":"default@table_name@year=2016/month=12/day=23"},{"partitionName":"default@table_name@year=2016/month=12/day=24"},{"partitionName":"default@table_name@year=2016/month=12/day=25"},{"partitionName":"default@table_name@year=2016/month=12/day=26"},{"partitionName":"default@table_name@year=2016/month=12/day=27"},{"partitionName":"default@table_name@year=2016/month=12/day=28"},{"partitionName":"default@table_name@year=2016/month=12/day=29"},{"partitionName":"default@table_name@year=2016/month=12/day=3"},{"partitionName":"default@table_name@year=2016/month=12/day=30"},{"partitionName":"default@table_name@year=2016/month=12/day=31"},{"partitionName":"default@table_name@year=2016/month=12/day=4"},{"partitionName":"default@table_name@year=2016/month=12/day=5"},{"partitionName":"default@table_name@year=2016/month=12/day=6"},{"partitionName":"default@table_name@year=2016/month=12/day=7"},{"partitionName":"default@table_name@year=2016/month=12/day=8"},{"partitionName":"default@table_name@year=2016/month=12/day=9"},{"partitionName":"default@table_name@year=2017/month=1/day=1"},{"partitionName":"default@table_name@year=2017/month=1/day=10"},{"partitionName":"default@table_name@year=2017/month=1/day=11"},{"partitionName":"default@table_name@year=2017/month=1/day=12"},{"partitionName":"default@table_name@year=2017/month=1/day=13"},{"partitionName":"default@table_name@year=2017/month=1/day=14"},{"partitionName":"default@table_name@year=2017/month=1/day=15"},{"partitionName":"default@table_name@year=2017/month=1/day=16"},{"partitionName":"default@table_name@year=2017/month=1/day=17"},{"partitionName":"default@table_name@year=2017/month=1/day=18"},{"partitionName":"default@table_name@year=2017/month=1/day=19"},{"partitionName":"default@table_name@year=2017/month=1/day=2"},{"partitionName":"default@table_name@year=2017/month=1/day=20"},{"partitionName":"default@table_name@year=2017/month=1/day=21"},{"partitionName":"default@table_name@year=2017/month=1/day=22"},{"partitionName":"default@table_name@year=2017/month=1/day=23"},{"partitionName":"default@table_name@year=2017/month=1/day=24"},{"partitionName":"default@table_name@year=2017/month=1/day=25"},{"partitionName":"default@table_name@year=2017/month=1/day=26"},{"partitionName":"default@table_name@year=2017/month=1/day=27"},{"partitionName":"default@table_name@year=2017/month=1/day=28"},{"partitionName":"default@table_name@year=2017/month=1/day=29"},{"partitionName":"default@table_name@year=2017/month=1/day=3"},{"partitionName":"default@table_name@year=2017/month=1/day=30"},{"partitionName":"default@table_name@year=2017/month=1/day=31"},{"partitionName":"default@table_name@year=2017/month=1/day=4"},{"partitionName":"default@table_name@year=2017/month=1/day=5"},{"partitionName":"default@table_name@year=2017/month=1/day=6"},{"partitionName":"default@table_name@year=2017/month=1/day=7"},{"partitionName":"default@table_name@year=2017/month=1/day=8"},{"partitionName":"default@table_name@year=2017/month=1/day=9"},{"partitionName":"default@table_name@year=2017/month=2/day=1"},{"partitionName":"default@table_name@year=2017/month=2/day=10"},{"partitionName":"default@table_name@year=2017/month=2/day=11"},{"partitionName":"default@table_name@year=2017/month=2/day=12"},{"partitionName":"default@table_name@year=2017/month=2/day=13"},{"partitionName":"default@table_name@year=2017/month=2/day=14"},{"partitionName":"default@table_name@year=2017/month=2/day=15"},{"partitionName":"default@table_name@year=2017/month=2/day=16"},{"partitionName":"default@table_name@year=2017/month=2/day=17"},{"partitionName":"default@table_name@year=2017/month=2/day=18"},{"partitionName":"default@table_name@year=2017/month=2/day=19"},{"partitionName":"default@table_name@year=2017/month=2/day=2"},{"partitionName":"default@table_name@year=2017/month=2/day=20"},{"partitionName":"default@table_name@year=2017/month=2/day=21"},{"partitionName":"default@table_name@year=2017/month=2/day=3"},{"partitionName":"default@table_name@year=2017/month=2/day=4"},{"partitionName":"default@table_name@year=2017/month=2/day=5"},{"partitionName":"default@table_name@year=2017/month=2/day=6"},{"partitionName":"default@table_name@year=2017/month=2/day=7"},{"partitionName":"default@table_name@year=2017/month=2/day=8"},{"partitionName":"default@table_name@year=2017/month=2/day=9"}],"input_tables":[{"tablename":"default@table_name","tabletype":"MANAGED_TABLE"}]}
explain dependency
select count (column_name)
from table_name
where year >= year (date_sub (current_date,7))
and month >= month (date_sub (current_date,7))
and day >= day (date_sub (current_date,7))
;
{"input_partitions":[{"partitionName":"default@table_name@year=2017/month=2/day=14"},{"partitionName":"default@table_name@year=2017/month=2/day=15"},{"partitionName":"default@table_name@year=2017/month=2/day=16"},{"partitionName":"default@table_name@year=2017/month=2/day=17"},{"partitionName":"default@table_name@year=2017/month=2/day=18"},{"partitionName":"default@table_name@year=2017/month=2/day=19"},{"partitionName":"default@table_name@year=2017/month=2/day=20"},{"partitionName":"default@table_name@year=2017/month=2/day=21"}],"input_tables":[{"tablename":"default@table_name","tabletype":"MANAGED_TABLE"}]}
这篇关于为什么此查询不会发生分区消除?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!