蜂巢扫描整个数据的表格 [英] Hive scanning entire data for bucketed table

查看：145 发布时间：2018/5/31 19:26:46 hadoop hive hiveql

本文介绍了蜂巢扫描整个数据的表格的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图通过在单个列上存储数据来优化配置单元SQL。我使用以下语句创建了表格：

pre $ CREATE TABLE`source_bckt`（
`uk` string，
`数据`字符串）
分类（英国）排序（英国）INTO 10 BUCKETS

然后在执行set hive.enforce.bucketing = true;

后插入数据当我运行以下选择select * from source_bckt where uk ='1179724';
尽管数据应该放在单个文件中，以下方程 HASH（'1179724'）％10 mapreduce在整个文件集合中生成了扫描结果。

想法？

解决方案

此优化目前还不支持。

当前JIRA票证状态为 PATCH AVAILABLE

https://issues.apache.org/jira/browse/HIVE-5831

I was trying to optimize a hive SQL by bucketing the data on a single column. I created the table with following statement

CREATE TABLE `source_bckt`(
  `uk` string, 
  `data` string)
CLUSTERED BY(uk) SORTED BY(uk) INTO 10 BUCKETS

Then inserted the data after executing "set hive.enforce.bucketing = true;"

When I run the following select "select * from source_bckt where uk='1179724';" Even though the data is supposed to be in a single file which can be identified by the following equation HASH('1179724')%10 the mapreduce spawned scans through the entire set of files.

Any idea?

解决方案

This optimization is not supported yet.
Current JIRA ticket status is PATCH AVAILABLE

https://issues.apache.org/jira/browse/HIVE-5831

这篇关于蜂巢扫描整个数据的表格的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

蜂巢扫描整个数据的表格 [英] Hive scanning entire data for bucketed table

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

蜂巢扫描整个数据的表格 [英] Hive scanning entire data for bucketed table

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

登录关闭