Hive的每个Insert查询都会在Hdfs文件系统中创建一个新文件 [英] Hive every Insert query creates a new file in Hdfs file system
问题描述
在每个插入查询中,一个文件都使用000000_0_copy *在Hdfs文件系统中创建.
On every insert query one files gets created with 000000_0_copy* in Hdfs file system.
这是蜂巢和Hdfs的默认行为吗?
Is this the default behaviour of hive and Hdfs ?
如果有的话,是否有压实的概念,那么共作用是如何工作的?
Is there any concept of compaction if yes then How does the comapaction work?
推荐答案
HDFS是仅追加文件系统,意味着修改(UPDATE/DELETE语句)已写入文件的任何部分,必须重写整个文件,然后替换旧文件,或写入新文件以插入单个记录.
HDFS is an append only filesystem, meaning to modify (UPDATE/DELETE statements) any portion of a file that is already written, one must rewrite the entire file and replace the old file, or write a new file to insert even a single record.
压紧不是一个自动过程.您需要编写自己的代码来查询一个表,然后插入另一种格式,例如parquet/orc
Compaction isn't an automatic process. You need to write your own code to query one table, then insert into another format like parquet/orc
这篇关于Hive的每个Insert查询都会在Hdfs文件系统中创建一个新文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!