在Hive中将分区添加到外部表需要很多时间 [英] Adding partitions to the external table in hive takes a lot of time

查看:77
本文介绍了在Hive中将分区添加到外部表需要很多时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道将分区添加到外部表的最佳方法是什么.我在S3的蜂巢中有一个外部表,分区为车辆=/日期=/小时= <小时>

I would like to know what is the best possible way(s) of adding partitions to the external table. I have a external table on S3 in hive with the partition as vehicle=/date=/hr=


现在可以在一天中的任何时间添加新车辆,并且一天中有几个小时或几天没有车辆的数据.

Now new vehicle can be added at any time of day and there will be vehicles which will not have data for a couple of hours in a day or for couple of days.

几乎没有解决方案-msck reapir表:这需要很多时间-通过脚本添加分区:我可能不知道何时创建新车辆或何时不存在车辆的小时数据

Few possible solutions - msck reapir table : It takes a lot of time - Add partition via script : I may not know when new vehicle gets created or which hour data is not there for a vehicle

人们通常如何解决将分区添加到外部表的问题

How do generally people solve this problem of adding partitions to the external tables

推荐答案

msck收货人表是执行此操作的正确方法.如果运行太慢,请尝试在之前修复表中自动关闭统计信息:

msck reapir table is a right way to do this. If it runs too slow, try to switch off stats autogather before repair table:

set hive.stats.autogather=false;

您可以在恢复分区后再次启用它.

You can enable it again after recovering partitions.

很可能您遇到的是 HIVE-18743 或相关错误.就我而言,这很有帮助.

Most probably you are hitting HIVE-18743 or related bug. In my case this helped.

这篇关于在Hive中将分区添加到外部表需要很多时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆