HIVE-插入覆盖vs删除表+创建表+插入 [英] HIVE - INSERT OVERWRITE vs DROP TABLE + CREATE TABLE + INSERT INTO

查看:159
本文介绍了HIVE-插入覆盖vs删除表+创建表+插入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在对蜂巢中的几个查询执行一些自动脚本,我们发现需要时间来清除表中的数据并插入新的。并且我们在考虑什么会更快?

I'm doing some automatic script of few queries in hive and we found that we need time to time clear the data from a table and insert the new one. And we are thinking what could be faster?

INSERT OVERWRITE TABLE SOME_TABLE
    SELECT * FROM OTHER_TABLE;

,或者这样做更快:

DROP TABLE SOME_TABLE;
CREATE TABLE SOME_TABLE (STUFFS);
INSERT INTO TABLE
    SELECT * FROM OTHER_TABLE;

运行查询的开销不是问题。由于我们还有创建脚本。问题是,具有十亿行的 INSERT OVERWRITE DROP + CREATE + INSERT INTO 快吗?

The overhead of running the queries is not an issue. Due to we have the script o creation too. The question is, the INSERT OVERWRITE with billion of rows is faster than DROP + CREATE + INSERT INTO?

推荐答案

对于最大速度,我建议1)问题 hadoop fs -rm -r -skipTrash table_dir / * 首先快速删除旧数据而不将文件放入垃圾箱,因为INSERT OVERWRITE会将所有文件放入垃圾箱,而对于很大的表,这将花费大量时间。然后2)执行 INSERT OVERWRITE 命令。这也将更快,因为您不需要删除/创建表。

For maximum speed I would suggest to 1) issue hadoop fs -rm -r -skipTrash table_dir/* first to remove old data fast without putting files into trash because INSERT OVERWRITE will put all files into Trash and for very big table this will take a lot of time. Then 2) do INSERT OVERWRITE command. This will be faster also because you do not need to drop/create table.

更新:

如果表具有 TBLPROPERTIES( auto.purge = true)对表运行 INSERT OVERWRITE 查询时,该表不会移至已删除邮件。此功能仅适用于托管表。因此,具有自动清除功能的INSERT OVERWRITE将比 rm -skipTrash + INSERT OVERWRITE 更快。 DROP + 创建 + INSERT ,因为它将是一个仅适用于Hive的命令。

As of Hive 2.3.0 (HIVE-15880), if the table has TBLPROPERTIES ("auto.purge"="true") the previous data of the table is not moved to Trash when INSERT OVERWRITE query is run against the table. This functionality is applicable only for managed tables. So, INSERT OVERWRITE with auto purge will work faster than rm -skipTrash + INSERT OVERWRITE or DROP+CREATE+INSERT because it will be a single Hive-only command.

这篇关于HIVE-插入覆盖vs删除表+创建表+插入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆