蜂巢扫描整个数据的表格 [英] Hive scanning entire data for bucketed table
问题描述
我试图通过在单个列上存储数据来优化配置单元SQL。我使用以下语句创建了表格:
pre $ CREATE TABLE`source_bckt`(
`uk` string,
`数据`字符串)
分类(英国)排序(英国)INTO 10 BUCKETS
然后在执行set hive.enforce.bucketing = true;
后插入数据当我运行以下选择select * from source_bckt where uk ='1179724';
尽管数据应该放在单个文件中,以下方程 HASH('1179724')%10
mapreduce在整个文件集合中生成了扫描结果。
想法?
此优化目前还不支持。
当前JIRA票证状态为 PATCH AVAILABLE
https://issues.apache.org/jira/browse/HIVE-5831
I was trying to optimize a hive SQL by bucketing the data on a single column. I created the table with following statement
CREATE TABLE `source_bckt`(
`uk` string,
`data` string)
CLUSTERED BY(uk) SORTED BY(uk) INTO 10 BUCKETS
Then inserted the data after executing "set hive.enforce.bucketing = true;"
When I run the following select "select * from source_bckt where uk='1179724';"
Even though the data is supposed to be in a single file which can be identified by the following equation HASH('1179724')%10
the mapreduce spawned scans through the entire set of files.
Any idea?
This optimization is not supported yet.
Current JIRA ticket status is PATCH AVAILABLE
https://issues.apache.org/jira/browse/HIVE-5831
这篇关于蜂巢扫描整个数据的表格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!