配置单元表在每个日期加载前重新创建 [英] Hive table re-create before load every date

查看:44
本文介绍了配置单元表在每个日期加载前重新创建的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我看到应用程序正在删除外部表,然后再次创建,然后每次加载数据时都加载数据并运行msck命令.每次删除和创建时,这样做有什么好处?

I saw application are droping external table and creating again then loading the data and runnning msck command every time data load..what is the benefit of this on every time dropping and creating?

推荐答案

删除和重新创建 EXTERNAL 表没有任何好处,因为删除表会使数据保持不变.

There is no benefit in dropping and recreating EXTERNAL table, because dropping table leaves data intact.

尽管删除和重新创建 MANAGED 表可能会有好处,因为它也会删除数据.

Though there may be a benefit in dropping and re-creating MANAGED table because it will drop data as well.

如果您在S3上运行,则可能是一种情况:

One possible scenario if you are running on S3:

在加载完成之前提早删除文件,而不是在加载时删除文件,可以减少加载后S3最终出现一致性问题的可能性.

Dropping files early before the load completes, not at the time of loading may reduce the possibility of eventual consistency issue in S3 after the load.

首先,当文件删除时,您在读取表时可能会遇到EC问题(在删除后以及一段时间内).提前删除文件将加快S3同步.

First of all, when the files dropped, you may hit EC issue (immediately after dropping and during some time) when reading table. Early drop of files will speed-up the S3 synchronizing.

第二,如果您写入具有相同名称的文件(重写),最终会出现问题.尽早删除可能会有所帮助,尽管最好在分区文件夹路径中使用guid前缀(唯一)文件名或时间戳,或使用其他类似技术来解决此类问题(重写后最终保持一致).

Second, the eventual issue if you writing files with the same name (rewriting). Early dropping may help, though better to use guid-prefixed(unique) filenames or timestamp in partition folder path or some other similar technics for solving this kind (eventual consistency after rewriting).

这篇关于配置单元表在每个日期加载前重新创建的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆