HDFS中零件文件的命名约定 [英] naming convention of part files in HDFS

查看:490
本文介绍了HDFS中零件文件的命名约定的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我们在Hive中执行INSERT INTO命令时,执行结果会在HDFS中创建多个零件文件。

例如。部分 - * - *****或000000_0,000001_0等或其他。



是否有配置/设置控制这些零件文件的命名?



我工作的集群创建了000000_0,000001_0和000000_1等。我想将其更改为部分或文本等,以便我可以更轻松地选择这些文件并在需要时合并它们。

如果在执行HQL之前有一个可以在Hive中设置的设置,那将是理想的。



<预先感谢。

解决方案

我认为您应该可以

  set mapreduce.output.basename = part-; 

这是行不通的。我发现的唯一方法是使用自定义文件编写器。

When we do an INSERT INTO command in Hive, the result of the execution creates multiple part files in HDFS.

e.g. part-*-***** or 000000_0,000001_0 etc or something else.

Is there a configuration/setting that controls the naming of these part files?

The cluster I work in creates 000000_0, 000001_0, 000000_1 etc. I would like to change this to part- or text- etc so that its easier for me to pick these files up and merge them if needed.

If there is a setting that can be set in Hive right before executing the HQL, that would be ideal.

Thanks in advance.

解决方案

I think you should be able

set mapreduce.output.basename = part-;

This won't work. The only way I have found is with a custom file writer.

这篇关于HDFS中零件文件的命名约定的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆