蜂巢扫描整个数据的表格 [英] Hive scanning entire data for bucketed table

查看:145
本文介绍了蜂巢扫描整个数据的表格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图通过在单个列上存储数据来优化配置单元SQL。我使用以下语句创建了表格:

pre $ CREATE TABLE`source_bckt`(
`uk` string,
`数据`字符串)
分类(英国)排序(英国)INTO 10 BUCKETS

然后在执行set hive.enforce.bucketing = true;



后插入数据当我运行以下选择select * from source_bckt where uk ='1179724';
尽管数据应该放在单个文件中,以下方程 HASH('1179724')%10 mapreduce在整个文件集合中生成了扫描结果。



想法?

解决方案

此优化目前还不支持。

当前JIRA票证状态为 PATCH AVAILABLE



https://issues.apache.org/jira/browse/HIVE-5831


I was trying to optimize a hive SQL by bucketing the data on a single column. I created the table with following statement

CREATE TABLE `source_bckt`(
  `uk` string, 
  `data` string)
CLUSTERED BY(uk) SORTED BY(uk) INTO 10 BUCKETS

Then inserted the data after executing "set hive.enforce.bucketing = true;"

When I run the following select "select * from source_bckt where uk='1179724';" Even though the data is supposed to be in a single file which can be identified by the following equation HASH('1179724')%10 the mapreduce spawned scans through the entire set of files.

Any idea?

解决方案

This optimization is not supported yet.
Current JIRA ticket status is PATCH AVAILABLE

https://issues.apache.org/jira/browse/HIVE-5831

这篇关于蜂巢扫描整个数据的表格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆