如何调整配置单元以查询元数据? [英] How to tune hive to query metadata?

查看:57
本文介绍了如何调整配置单元以查询元数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我在具有某些分区列的表上运行以下配置单元查询,我想确保配置单元不进行全表扫描,而只是从元数据本身中找出结果.有什么方法可以启用此功能吗?

In case I am running a below hive query on table with certain partitioned column, I want to make sure hive does not do full table scan and just figure out the result from meta data itself. Is there any way to enable this ?

Select max(partitioned_col) from hive_table ;

现在,当我运行此查询时,它的启动映射减少了任务,并且可以确定它在进行数据扫描,同时可以很好地从元数据本身中找出值.

Right now , when I am running this query , its launching map reduce tasks and I am sure its doing data scan while it can very well figure out the value from metadata itself.

推荐答案

每次更改数据时都要计算表统计信息.

Compute table statistics every time you changed data.

ANALYZE TABLE hive_table PARTITION(partitioned_col) COMPUTE STATISTICS FOR COLUMNS;

启用CBO和统计信息自动收集:

Enable CBO and statistics auto gathering:

set hive.cbo.enable=true;
set hive.stats.autogather=true;

使用这些设置可以使用统计信息启用CBO:

Use these settings to enable CBO using statistics:

set hive.compute.query.using.stats=true;
set hive.stats.fetch.partition.stats=true;
set hive.stats.fetch.column.stats=true;

如果没有帮助,我建议采用这种方法快速找到最后一个分区: 使用Shell脚本从表位置解析最大分区键. 下面的命令将打印所有表文件夹路径,排序,采用最新排序,采用最后一个子文件夹名称,解析分区文件夹名称并提取值.您只需要初始化TABLE_DIR变量并放入the number of partition subfolder in the path:

If nothing helps I'd recommend to apply this approach for finding last partition fast: Parse max partition key using shell script from the table location. The command below will print all table folder paths, sort, take latest sorted, take last subfolder name, parse partition folder name and extract value. All you need is to initialize TABLE_DIR variable and put the number of partition subfolder in the path:

last_partition=$(hadoop fs -ls $TABLE_DIR/* | awk '{ print $8 }' | sort -r | head -n1 | cut -d / -f [number of partition subfolder in the path here] | cut -d = -f 2

然后使用$last_partition变量以

  hive -hiveconf last_partition="$last_partition" -f your_script.hql

这篇关于如何调整配置单元以查询元数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆