映射器或简化器如何将数据写入HDFS? [英] How mapper or reducer writes data to HDFS?
问题描述
FileOutputFormat.setOutputPath
,并使用映射器或reducer的 context.write将结果写入HDFS文件(key,value)
; 文件编写概念实际上是如何工作的? Mapper / Reducer将不断发出记录。 b 每个记录都会直接发送到HDFS吗?
$ b
或
应用程序完成后,它会执行 copyFromLocal
?
或
它为每个映射器或缩减器在本地文件系统中创建临时文件?
每当我对Hadoop中的事情有疑问时,我倾向于利用它的开源性质并深入研究源代码。在这种情况下,您需要查看输出数据时使用的类 - TextOutputFormat和FSDataOutputStream。
In MapReduce program, we just set the output path FileOutputFormat.setOutputPath
and write the result to a HDFS file using mapper or reducer's context.write(key, value)
;
How the file writing concept actually works?
Mapper/ Reducer will be continuously emiting the records.
Will each record is sent to HDFS directly?
or
once the application is completed then it will do a copyFromLocal
?
or
it create a temporary files in local file system for each mapper or reducer?
Records are written to a byte stream, and flushed periodically to disk on the HDFS. Each record isn't individually written, as that would be a very expensive operation. Also data isn't written to the local file system as again that would be a very expensive operation.
Whenever I have questions about things in Hadoop, I tend to take advantage of its open source nature and delve into the source code. In this case you'd want to take a look at the classes used when outputting data - TextOutputFormat and FSDataOutputStream.
这篇关于映射器或简化器如何将数据写入HDFS?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!