更改Hadoop中的数据节点数 [英] change number of data nodes in Hadoop

查看:154
本文介绍了更改Hadoop中的数据节点数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何更改数据节点的数量(即禁用并启用某些数据节点以测试可伸缩性)? 更清楚地说,我有4个数据节点,并且我想分别对1个,2个,3个和4个数据节点进行性能测试.是否可以只更新namenode中的slaves文件?

How to change the number of data nodes, that is disable and enable certain data nodes to test scalability? To be more clear, I have 4 data nodes, and I want to experiment the performance with 1, 2, 3 and 4 data nodes one by one. Would it be possible just updating slaves file in namenode?

推荐答案

临时停用节点的正确方法:

The correct way to temporarily decommission a node:

  1. 创建一个排除文件".这会列出您要删除的主机(每行一个).
  2. dfs.hosts.excludemapred.hosts.exclude设置为此文件的位置.
  3. 通过执行hadoop dfsadmin -refreshNodeshadoop mradmin -refreshNodes
  4. 更新namenode和jobtracker
  5. 这将启动退役过程.曾经在这些节点上复制的所有数据将被复制到这些节点上,并复制到其他节点上.您可以通过Web UI检查进度.
  1. Create an "exclude file". This lists the hosts, one per line, that you wish to remove.
  2. Set dfs.hosts.exclude and mapred.hosts.exclude to the location of this file.
  3. Update the namenode and jobtracker by doing hadoop dfsadmin -refreshNodes and hadoop mradmin -refreshNodes
  4. This will start the decomissioning process. All of the data that used to be replicated on those nodes will be copied off of them and onto other nodes. You can check the progress through the web UI.

请注意,一旦您执行hadoop mradmin -refreshNodes,这些节点将不会用于MR作业,但它们仍将保留数据,因此您可能会吃掉一些网络延迟,如果您在退役完成之前运行某些操作,则不会这样做.因此,对于完全现实的测试,您应该等到测试完成.

Note that those nodes will not be used for MR jobs as soon as you do hadoop mradmin -refreshNodes but they will still hold data, so you might eat some network latency that you wouldn't otherwise if you run something before decommissioning is complete. So for a totally realistic test, you should wait until it is finished.

要重新添加节点,只需从排除文件中将其删除,然后再次执行-refreshNodes命令即可.

To add the nodes back, simply remove them from the exclude file and do the -refreshNodes commands again.

这篇关于更改Hadoop中的数据节点数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆