Hadoop复制模型-DataStreamer/Namenode [英] Hadoop Replication Model - DataStreamer/Namenode

查看:119
本文介绍了Hadoop复制模型-DataStreamer/Namenode的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

首先,谢谢您阅读我的问题!

First of all, thank you for reading my question!

我目前正在研究Hadoop的复制模型,但是我处于死胡同.我从《 Oreilly Hadoop The Definitive Guide 2012年1月第三版》一书中学习.要提这个问题,我首先需要阅读这本书的下面的文字.

I'm currently studying the replication model of Hadoop but I'm at a dead end. I study from the the book "Oreilly Hadoop The Definitive Guide 3rd Edition Jan 2012". To come to the question, I first need to to read the beneath text from the book.

在第73页上,有以下内容:

On page 73, there is the following:

" DistributedFileSystem 为客户端Hadoop分布式文件系统返回 FSDataOutputStream ,开始向其写入数据.与读取情况一样, FSDataOutputStream >包装 DFSOutput流,该流处理与数据节点和名称节点的通信.当客户端写入数据时(第3步),

"The DistributedFileSystem returns an FSDataOutputStream for the client The Hadoop Distributed Filesystem to start writing data to. Just as in the read case, FSDataOutputStream wraps a DFSOutput Stream, which handles communication with the datanodes and namenode. As the client writes data (step 3),

DFSOutputStream 将其拆分为数据包,然后将其写入内部队列(称为数据队列).数据队列由Data Streamer消耗,Data Streamer的职责是通过选择合适的数据节点列表来存储副本来要求namenode分配新的块."*

DFSOutputStream splits it into packets, which it writes to an internal queue, called the data queue. The data queue is consumed by the Data Streamer, whose responsibility it is to ask the namenode to allocate new blocks by picking a list of suitable datanodes to store the replicas."*

如您所见, DFSOutputStream 具有一个数据包数据队列.数据队列正在由DataStreamer使用,后者要求namenode分配新的块.

As you can see, the DFSOutputStream has a data queue of packets. The data queue is being consumed by the DataStreamer who asks the namenode to allocate new blocks.

我的问题:

这是如何工作的?

How does this work?

Namenode如何分配新块?

How does the Namenode allocate new blocks?

相同的问题,请问不同的问题: Namenode如何创建合适的Datanode列表?

Same question, ask differently: How does the Namenode create a list of suitable Datanodes?

我在互联网或书中找不到与此有关的任何内容.这本书从高层次解释了这一过程.

I can't find anything about this on the internet or in the book. The book explains the process from a high level.

我非常感谢您为我提供的帮助,谢谢!

I really appreciate your time helping me, I thank you!

推荐答案

看看

例如,当复制因子为3时,HDFS的放置策略如下所示:

For example, when the replication factor is three, HDFS’s placement policy is as follows from grepcode

** The class is responsible for choosing the desired number of targets
 * for placing block replicas.
 * The replica placement strategy is that if the writer is on a datanode,
 * the 1st replica is placed on the local machine, 
 * otherwise a random datanode. The 2nd replica is placed on a datanode
 * that is on a different rack. The 3rd replica is placed on a datanode
 * which is on a different node of the rack as the second replica.
 */
@InterfaceAudience.Private
public class BlockPlacementPolicyDefault extends BlockPlacementPolicy {

此策略减少了机架间的写流量,从而提高了写性能.

This policy cuts the inter-rack write traffic which improves write performance.

机架故障的机会远小于节点故障的机会;此政策不会影响数据的可靠性和可用性保证.

The chance of rack failure is far less than that of node failure; This policy does not impact data reliability and availability guarantees.

使用此策略,文件的副本不会均匀分布在机架上.

With this policy, the replicas of a file do not evenly distribute across the racks.

三分之一的副本位于一个节点上

One third of replicas are on one node

三分之二的副本放在一个机架上

two thirds of replicas are on one rack

另外三分之一分布在其余机架上.

And the other third are evenly distributed across the remaining racks.

此策略可提高写入性能,而不会影响数据可靠性或读取性能.

This policy improves write performance without compromising data reliability or read performance.

==>

第一个和第三个副本存在于一个RAC上,第二个副本存在于另一个RAC(远程)上

1st and 3rd replica exists on one RAC and 2nd replica exists on other RAC ( remote)

这篇关于Hadoop复制模型-DataStreamer/Namenode的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆