配置单元中的分区和索引之间的区别 [英] Difference between partition and index in hive

查看:103
本文介绍了配置单元中的分区和索引之间的区别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是hadoop和hive中的新成员,我知道
蜂巢中的索引和分区有什么区别?当我使用索引和分区时?

谢谢!

解决方案

索引是新的并且正在发展(功能正在被添加),但是目前索引仅限于单个表,并且不能与外部表一起使用。创建一个索引创建一个单独的表。索引可以分区(匹配基表的分区)。索引用于加速表中数据的搜索。



分区在hdfs级别提供数据分隔,为每个分区创建子目录。分区允许在查询中读取的文件数量和数据量受到限制。但是,为了实现这一点,必须在WHERE子句中指定分区列。



在构建数据模型时,可以根据以下内容确定最佳使用索引和/数据的大小和预期的使用模式。

I am new in hadoop and hive and I would know what is the difference between index and partition in hive? When I use index and when partition?

Thank you!

解决方案

Indexes are new and evolving (features are being added) but currently Indexes are limited to single tables and cannot be used with external tables. Creating an index creates a separate table. Indexes can be partitioned (matching the partitions of the base table). Indexes are used to speed the search of data within tables.

Partitions provide segregation of the data at the hdfs level, creating sub-directories for each partition. Partitioning allows the number of files read and amount of data searched in a query to be limited. For this to occur however, partition columns must be specified in your WHERE clauses.

While building your data model you can determine the best use of indexes and/or partitions based on the size of your data and your expected use patterns.

这篇关于配置单元中的分区和索引之间的区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆