hadoop中的SUCCESS和part-r-00000文件是什么 [英] What are SUCCESS and part-r-00000 files in hadoop

查看：5876 发布时间：2018/5/31 18:29:24 hadoop mapreduce

本文介绍了hadoop中的SUCCESS和part-r-00000文件是什么的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

尽管我在Ubuntu机器上经常使用Hadoop，但我从来没有想过 SUCCESS 和 part-r-00000 文件。输出总是驻留在 part-r-00000 文件中，但是 SUCCESS 文件的用途是什么？为什么输出文件的名称是 part-r-0000 ？是否有任何意义/任何命名法，或者这只是一个随机定义？ 解决方案

请参阅http://www.cloudera.com/blog/2010/08/什么％E2％80％99s -new-in-apache-hadoop-0-21 /

成功完成MapReduce运行时在输出目录中创建一个_SUCCESS文件。这对于需要通过检查HDFS来查看结果集是否完整的应用程序非常有用。（MAPREDUCE-947）

这通常会被作业调度系统（如OOZIE）用来表示后续处理关于这个目录的内容可以在所有数据已经输出时开始。

更新（回应评论）

默认输出文件名为part-x-yyyyy其中：

x 可以是'm'或'r'，具体取决于作业是否是仅限地图的作业，或者是减少

yyyyy 是映射器或减速器任务号（从零开始）

名为part-r-00000的文件为part-r-00031，每个reducer任务一个。

Although I use Hadoop frequently on my Ubuntu machine I have never thought about SUCCESS and part-r-00000 files. The output always resides in part-r-00000 file, but what is the use of SUCCESS file? Why does the output file have the name part-r-0000? Is there any significance/any nomenclature or is this just a randomly defined?
解决方案
See http://www.cloudera.com/blog/2010/08/what%E2%80%99s-new-in-apache-hadoop-0-21/

On the successful completion of a job, the MapReduce runtime creates a _SUCCESS file in the output directory. This may be useful for applications that need to see if a result set is complete just by inspecting HDFS. (MAPREDUCE-947)

This would typically be used by job scheduling systems (such as OOZIE), to denote that follow-on processing on the contents of this directory can commence as all the data has been output.

Update (in response to comment)

The output files are by default named part-x-yyyyy where:

x is either 'm' or 'r', depending on whether the job was a map only job, or reduce

yyyyy is the mapper or reducer task number (zero based)

So a job which has 32 reducers will have files named part-r-00000 to part-r-00031, one for each reducer task.

这篇关于hadoop中的SUCCESS和part-r-00000文件是什么的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

hadoop中的SUCCESS和part-r-00000文件是什么 [英] What are SUCCESS and part-r-00000 files in hadoop

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

hadoop中的SUCCESS和part-r-00000文件是什么 [英] What are SUCCESS and part-r-00000 files in hadoop

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

登录关闭