什么是“HDFS写入管道”? [英] What is "HDFS write pipeline"?
问题描述
编写reduce输出确实会消耗网络带宽,但是只有HDFS写入管道消耗的
很多。
问题:
1.有些帮助我更详细地了解上面的句子。
2.HDFS写入管道是什么意思?
我认为当它写上写入管道时,它只是意味着:
- 创建块
- 注册NN
- 执行复制
- 执行写入刷新到磁盘
- 在集群中维护数据块状态(位置,锁定,上次更新,校验和等)
While i was going through hadoop definitive guide, i stuck at below sentence:-
writing the reduce output does consume network bandwidth, but only as much as a normal HDFS write pipeline consumes.
Questions : 1. Can some help me understand above sentence in more detail. 2. And what does "HDFS write pipeline" mean ?
When files are written to HDFS a number of things are going on behind the scenes related to HDFS block consistency and replication. The main IO component of this process is by far replication. There is also the bidirectional communication with the name node registering the block's existence and state.
I think when it says "write pipeline" it just means the process of:
- Creating the blocks
- Registering with the NN
- Performing replication
- Doing write flushes to disk
- Maintaining block state across the cluster (location, is-locked, last-updated, checksums, ect)
这篇关于什么是“HDFS写入管道”?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!