如何明确定义datanode以将特定给定文件存储在HDFS中? [英] How to explicilty define datanodes to store a particular given file in HDFS?

查看:269
本文介绍了如何明确定义datanode以将特定给定文件存储在HDFS中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想编写一个脚本或类似.xml文件,它明确定义Hadoop集群中的datanode以存储特定的文件块。例如

假设有4个从节点和1个主节点(hadoop集群共有5个节点)。
有两个文件file01(size = 120 MB)和file02(size = 160 MB)。默认块大小= 64MB

现在我想存储一个两个从属节点1处的file01块和另一个从属节点2处的块。
类似于从节点1处的file02的三个块中的一个,从节点3处的第二个块和从节点4处的第三个块中的一个。
所以,我的问题是我该如何做到这一点?实际上有一种方法:在每次存储文件时在conf / slaves文件中进行更改。
但我不想这样做
所以,还有另一种解决方案来做到这一点?
我希望我明确表达我的观点。
等待您的回应.. !!!

解决方案

没有办法实现你在这里问的问题 - 名称节点将根据机架配置,复制因子和节点可用性将块复制到数据节点,因此,即使您设法在两个特定数据节点上获得块,如果其中一个节点关闭,名称节点也会复制该块到另一个节点。

您的要求还假定复制因子为1,这不会给您任何数据冗余(如果您输了,这是一件坏事一个数据节点)。

让namenode管理块的分配,并定期使用平衡器,如果你想让你的集群均匀分布


I want to write a script or something like .xml file which explicitly defines the datanodes in Hadoop cluster to store a particular file blocks. for example: Suppose there are 4 slave nodes and 1 Master node (total 5 nodes in hadoop cluster ). there are two files file01(size=120 MB) and file02(size=160 MB).Default block size =64MB

Now I want to store one of two blocks of file01 at slave node1 and other one at slave node2. Similarly one of three blocks of file02 at slave node1, second one at slave node3 and third one at slave node4. So,my question is how can I do this ?

actually there is one method :Make changes in conf/slaves file every time to store a file. but I don't want to do this So, there is another solution to do this ?? I hope I made my point clear. Waiting for your kind response..!!!

解决方案

There is no method to achieve what you are asking here - the Name Node will replicate blocks to data nodes based upon rack configuration, replication factor and node availability, so even if you do managed to get a block on two particular data nodes, if one of those nodes goes down, the name node will replicate the block to another node.

Your requirement is also assuming a replication factor of 1, which doesn't give you any data redundancy (which is a bad thing if you lose a data node).

Let the namenode manage block assignments and use the balancer periodically if you want to keep your cluster evenly distibuted

这篇关于如何明确定义datanode以将特定给定文件存储在HDFS中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆