将节点添加到分布式系统的数据划分 [英] Data division on Addition of node to distributed System

查看:205
本文介绍了将节点添加到分布式系统的数据划分的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一台分布式计算机网络,其中有1000个存储节点。
现在,如果添加了新节点,应该做什么?
意味着数据现在应该被等分为1001个节点?

如果节点范围是10而不是1000,答案也会改变。

$客户端机器首先将文件分割成块Say块A,块B,然后客户机与NameNode进行交互,要求位置放置这些块( Block A Block B).NameNode给出了clinet写入数据的datanode列表。然后,客户机从这些列表中选择第一个数据节点,并将第一个数据块写入数据节点,数据节点将数据块复制到另一个数据节点。 NameNode保存有关文件及其相关块的信息。



如果在hadoop集群中添加了数据节点,HDFS将不会将块从旧的datanodes移动到新的datanode以平衡集群。



平衡器程序是一个Hadoop守护进程,它通过将
从过度使用的datanodes移动到未充分利用的datanodes来重新分配块,同时遵守块复制品放置策略,通过将块复制品放在不同的机架上,使数据丢失不太可能。它会移动块直到认为集群是平衡的,这意味着每个数据节点的使用率(节点上已用空间与节点总容量的比率)不同于集群的利用率(集群上已用空间的比率到集群的总容量)不超过给定的阈值百分比。



参考:Hadoop权威指南第3版第350页

作为hadoop管理员,您应该一天安排一次平衡工作,以平衡hadoop群集上的块。



与平衡器相关的有用链接: p>

http://www.swiss-scalability.com/2013/08/hadoop-hdfs-balancer-explained.html



http://www.cloudera.com/content/cloudera / EN / documentatio n / cdh4 / latest / CDH4-Installation-Guide / cdh4ig_balancer.html


Suppose I am having a distributed networks of computer in which i have say 1000 storage nodes. Now if a new node is added, what should be done? Meaning the data now should get equally divided into 1001 nodes ?

Also will the answer change if nodes range is 10 instead of 1000.

解决方案

The client machine first splits the file into block Say block A, Block B then client machine interact with NameNode to asks the location to place these blocks (Block A Block B).NameNode gives a list of datanodes to the clinet to write the data. NameNode generally choose nearest datanode from network for this.

Then client choose first datanode from those list and write the first block to the datanode and datanode replicates the block to another datanodes. NameNode keeps the information about files and their associated blocks.

HDFS will not move blocks from old datanodes to new datanodes to balance the cluster if a datanode added in hadoop cluster.To do this, you need to run the balancer.

The balancer program is a Hadoop daemon that redistributes blocks by moving them from over utilized datanodes to underutilized datanodes, while adhering to the block replica placement policy that makes data loss unlikely by placing block replicas on different racks. It moves blocks until the cluster is deemed to be balanced, which means that the utilization of every datanode (ratio of used space on the node to total capacity of the node) differs from the utilization of the cluster (ratio of used space on the cluster to total capacity of the cluster) by no more than a given threshold percentage.

Reference: Hadoop Definitive Guide 3rd edition Page No 350

As a hadoop admin you should schedule balance job at once in a day to balance blocks on hadoop cluster.

Useful link related to balancer:

http://www.swiss-scalability.com/2013/08/hadoop-hdfs-balancer-explained.html

http://www.cloudera.com/content/cloudera/en/documentation/cdh4/latest/CDH4-Installation-Guide/cdh4ig_balancer.html

这篇关于将节点添加到分布式系统的数据划分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆