什么是“HDFS写入管道”? [英] What is "HDFS write pipeline"?

查看:198
本文介绍了什么是“HDFS写入管道”?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述


编写reduce输出确实会消耗网络带宽,但是只有HDFS写入管道消耗的
很多。


问题:
1.有些帮助我更详细地了解上面的句子。
2.HDFS写入管道是什么意思?

当解析文件写入HDFS时,事情发生在与HDFS块一致性和复制相关的幕后。这个过程的主要IO组件是迄今为止的复制。还有与注册该块的存在和状态的名称节点的双向通信。



我认为当它写上写入管道时,它只是意味着:


  1. 创建块

  2. 注册NN

  3. 执行复制

  4. 执行写入刷新到磁盘

  5. 在集群中维护数据块状态(位置,锁定,上次更新,校验和等)


While i was going through hadoop definitive guide, i stuck at below sentence:-

writing the reduce output does consume network bandwidth, but only as much as a normal HDFS write pipeline consumes.

Questions : 1. Can some help me understand above sentence in more detail. 2. And what does "HDFS write pipeline" mean ?

解决方案

When files are written to HDFS a number of things are going on behind the scenes related to HDFS block consistency and replication. The main IO component of this process is by far replication. There is also the bidirectional communication with the name node registering the block's existence and state.

I think when it says "write pipeline" it just means the process of:

  1. Creating the blocks
  2. Registering with the NN
  3. Performing replication
  4. Doing write flushes to disk
  5. Maintaining block state across the cluster (location, is-locked, last-updated, checksums, ect)

这篇关于什么是“HDFS写入管道”?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆