在Hive中将分区添加到外部表需要很多时间 [英] Adding partitions to the external table in hive takes a lot of time
问题描述
我想知道将分区添加到外部表的最佳方法是什么.我在S3的蜂巢中有一个外部表,分区为车辆=/日期=/小时= <小时>
I would like to know what is the best possible way(s) of adding partitions to the external table. I have a external table on S3 in hive with the partition as vehicle=/date=/hr=
现在可以在一天中的任何时间添加新车辆,并且一天中有几个小时或几天没有车辆的数据.
Now new vehicle can be added at any time of day and there will be vehicles which will not have data for a couple of hours in a day or for couple of days.
几乎没有解决方案-msck reapir表:这需要很多时间-通过脚本添加分区:我可能不知道何时创建新车辆或何时不存在车辆的小时数据
Few possible solutions - msck reapir table : It takes a lot of time - Add partition via script : I may not know when new vehicle gets created or which hour data is not there for a vehicle
人们通常如何解决将分区添加到外部表的问题
How do generally people solve this problem of adding partitions to the external tables
推荐答案
msck收货人表
是执行此操作的正确方法.如果运行太慢,请尝试在之前修复表中自动关闭统计信息:
msck reapir table
is a right way to do this. If it runs too slow, try to switch off stats autogather before repair table:
set hive.stats.autogather=false;
您可以在恢复分区后再次启用它.
You can enable it again after recovering partitions.
很可能您遇到的是 HIVE-18743 或相关错误.就我而言,这很有帮助.
Most probably you are hitting HIVE-18743 or related bug. In my case this helped.
这篇关于在Hive中将分区添加到外部表需要很多时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!