分区配置单元 [英] Partitioning in hive

查看:195
本文介绍了分区配置单元的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在蜂巢中使用静态分区以根据日期字段将数据隔离到子目录中,因为我每天需要向蜂巢中加载数据,所以每个表(总共14个表)每年需要365个分区.

I'm using static partition in hive to seggregate the data into subdirectories based on date field, I'll need 365 partitions/year for each table(total 14 tables) as I have daily loads into hive.

在配置单元中可以创建的静态分区数量是否有限制?

Is there any limitation on number of static partitions that can be created in hive?

如果"hive.exec.max.dynamic.partitions.pernode",动态分区给出错误 超过了sqoop导入的指定阈值(100)

Dynamic partition gives error if "hive.exec.max.dynamic.partitions.pernode" exceeds the specified thresold(100) in sqoop import

我有5个节点HDP集群,其中3个是数据节点

I have 5 node HDP cluster out of which 3 are datanodes

如果我增加在蜂巢中可以创建的分区数量,是否会影响集群的性能?

Will it hamper performace of cluster if I increase the number of partitions that can be created in hive ?

该限制仅适用于动态分区还是适用于静态分区?

Is that limitation only for dynamic partition or it is applicable to static as well?

参考

检查故障排除和最佳实践部分 https://cwiki.apache.org/confluence/display/Hive/Tutorial

Check trrouble shooting and best practices section https://cwiki.apache.org/confluence/display/Hive/Tutorial

请建议

推荐答案

对于在日期字段上进行分区,最好的方法是根据年/月/日进行分区.

For partitioning on date field, the best approach is to partition based on year/month/day.

也就是说,根据您的要求,您应该选择分区策略. 这样的分区数量没有限制,除非并且直到您过度分区为止.这意味着不必要地创建了太多的分区,并且每个分区都存储了非常少量的数据.

That said, based on your requirement you should choose your partition strategy. There is no limitation on number of partitions as such unless and until you are over partitioning. which means unnecessarily creating too many partitions and each partition storing very small amount of data.

关于错误,您可以通过增加数字来解决. 您可以在hive中设置hive.exec.max.dynamic.partitions.pernode.

Regarding the error, you can fix it by increasing the number. You can set hive.exec.max.dynamic.partitions.pernode in hive.

希望这会有所帮助.

这篇关于分区配置单元的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆