HDFS中零件文件的命名约定 [英] naming convention of part files in HDFS
问题描述
当我们在Hive中执行INSERT INTO命令时,执行结果会在HDFS中创建多个零件文件。
例如。部分 - * - *****或000000_0,000001_0等或其他。
是否有配置/设置控制这些零件文件的命名?
我工作的集群创建了000000_0,000001_0和000000_1等。我想将其更改为部分或文本等,以便我可以更轻松地选择这些文件并在需要时合并它们。
如果在执行HQL之前有一个可以在Hive中设置的设置,那将是理想的。
<预先感谢。
我认为您应该可以
set mapreduce.output.basename = part-;
这是行不通的。我发现的唯一方法是使用自定义文件编写器。
When we do an INSERT INTO command in Hive, the result of the execution creates multiple part files in HDFS.
e.g. part-*-***** or 000000_0,000001_0 etc or something else.
Is there a configuration/setting that controls the naming of these part files?
The cluster I work in creates 000000_0, 000001_0, 000000_1 etc. I would like to change this to part- or text- etc so that its easier for me to pick these files up and merge them if needed.
If there is a setting that can be set in Hive right before executing the HQL, that would be ideal.
Thanks in advance.
I think you should be able
set mapreduce.output.basename = part-;
This won't work. The only way I have found is with a custom file writer.
这篇关于HDFS中零件文件的命名约定的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!