映射器或简化器如何将数据写入HDFS? [英] How mapper or reducer writes data to HDFS?

查看:237
本文介绍了映射器或简化器如何将数据写入HDFS?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在MapReduce程序中,我们只设置输出路径 FileOutputFormat.setOutputPath ,并使用映射器或reducer的 context.write将结果写入HDFS文件(key,value);



文件编写概念实际上是如何工作的?


  • Mapper / Reducer将不断发出记录。 b
    $ b

    每个记录都会直接发送到HDFS吗?





应用程序完成后,它会执行 copyFromLocal





它为每个映射器或缩减器在本地文件系统中创建临时文件?

http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html#Example:_WordCount_v1.0

$ b $记录被写入到一个字节流中,并定期刷新到HDFS上的磁盘上。每个记录都不是单独写入的,因为这将是非常昂贵的操作。此外,数据不会再次写入本地文件系统,这将是一项非常昂贵的操作。



每当我对Hadoop中的事情有疑问时,我倾向于利用它的开源性质并深入研究源代码。在这种情况下,您需要查看输出数据时使用的类 - TextOutputFormat和FSDataOutputStream。


In MapReduce program, we just set the output path FileOutputFormat.setOutputPath and write the result to a HDFS file using mapper or reducer's context.write(key, value);

How the file writing concept actually works?

  • Mapper/ Reducer will be continuously emiting the records.

    Will each record is sent to HDFS directly?

or

once the application is completed then it will do a copyFromLocal?

or

it create a temporary files in local file system for each mapper or reducer?

http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html#Example:_WordCount_v1.0

解决方案

Records are written to a byte stream, and flushed periodically to disk on the HDFS. Each record isn't individually written, as that would be a very expensive operation. Also data isn't written to the local file system as again that would be a very expensive operation.

Whenever I have questions about things in Hadoop, I tend to take advantage of its open source nature and delve into the source code. In this case you'd want to take a look at the classes used when outputting data - TextOutputFormat and FSDataOutputStream.

这篇关于映射器或简化器如何将数据写入HDFS?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆