我如何正确删除Hadoop中的节点? [英] How do I correctly remove nodes in Hadoop?

查看:165
本文介绍了我如何正确删除Hadoop中的节点?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在拥有10台以上机器的群集上运行Hadoop 1.1.2。我想很好地扩大和缩小HDFS和MapReduce。 很好,我的意思是我要求数据不会丢失(允许HDFS节点分解),并且节点在关闭前完成任务。



I'我们注意到 datanode 进程在进行完成后会死掉,这很好。这是我所做的删除节点:$ b​​
$ b


  • 将节点添加到mapred.exclude

  • 添加节点到hdfs.exclude
  • $ hadoop mradmin -refreshNodes

  • $ hadoop dfsadmin -refreshNodes

  • $ hadoop-daemon.sh stop tasktracker



  • $ b

    要重新添加节点(假设它已被删除,就像上面那样),这就是我正在做的事。




    • 从mapred.exclude中移除
    • 从hdfs.exclude中移除
    • $ hadoop的mradmin -refreshNodes

    • $ HADOOP dfsadmin -refreshNodes

    • $ hadoop-daemon.sh start tasktracker
    • $ hadoop-daemon.sh start datanode



    这是正确放大和缩小的正确方法吗?缩小规模时,我注意到某些不幸的工作会急剧增加工作时间(因为他们在移除的节点上运行的任务需要重新计划)。

    解决方案

    如果您之前未设置dfs排除文件,请按照1-3。


    1. 关闭NameNode。

    2. 将dfs.hosts.exclude设置为指向一个空的排除文件。

    3. 重新启动NameNode。

    4. 在dfs exclude文件中,指定使用完整主机名或IP或IP:端口格式的节点。

    5. 在mapred.exclude中执行相同的操作。
    6. 执行 bin / hadoop dfsadmin -refreshNodes 。这强制NameNode重新读取排除文件并启动退役过程。

    7. 监控NameNode和JobTracker Web UI并确认正在进行退役过程。可能需要几秒钟才能更新。消息如节点XXXX.XXXX.X.XX:XXXXX的停用完成将在完成停用后显示在NameNode日志文件中,此时您可以删除节点来自集群。

    8. 当进程完成时,namenode UI会将数据节点列为已停用。 Jobtracker页面将显示更新的活动节点数量。运行 bin / hadoop dfsadmin -report 来验证。停止排除的节点上的datanode和tasktracker进程。

    9. 如果您不打算将计算机重新引入群集,请将其从
      包含和排除文件中移除。



    'p>要添加一个节点作为数据管理部和的TaskTracker看到 Hadoop的FAQ页



    编辑:当要从集群中删除活动节点时,作业会发生什么?



    在某个节点上运行的作业将被取消调试,因为该节点上调度的作业任务将被标记为KILLED_UNCLEAN(用于映射和减少任务)或KILLED (用于作业设置和清理任务)。见线4633在 JobTracker.java 了解详情。工作将被通知失败的任务。大多数情况下,Job追踪器将重新计划执行。但是,在经历多次重复失败之后,它可能决定允许整个作业失败或成功。见线2957年起在 JobInProgress.java


    I'm running Hadoop 1.1.2 on a cluster with 10+ machines. I would like to nicely scale up and down, both for HDFS and MapReduce. By "nicely", I mean that I require that data not be lost (allow HDFS nodes to decomission), and nodes running a task finish before shutting down.

    I've noticed the datanode process dies once decomissioning is done, which is good. This is what I do to remove a node:

    • Add node to mapred.exclude
    • Add node to hdfs.exclude
    • $ hadoop mradmin -refreshNodes
    • $ hadoop dfsadmin -refreshNodes
    • $ hadoop-daemon.sh stop tasktracker

    To add the node back in (assuming it was removed like above), this is what I'm doing.

    • Remove from mapred.exclude
    • Remove from hdfs.exclude
    • $ hadoop mradmin -refreshNodes
    • $ hadoop dfsadmin -refreshNodes
    • $ hadoop-daemon.sh start tasktracker
    • $ hadoop-daemon.sh start datanode

    Is this the correct way to scale up and down "nicely"? When scaling down, I'm noticing job-duration rises sharply for certain unlucky jobs (since the tasks they had running on the removed node need to be re-scheduled).

    解决方案

    If you have not set dfs exclude file before, follow 1-3. Else start from 4.

    1. Shut down the NameNode.
    2. Set dfs.hosts.exclude to point to an empty exclude file.
    3. Restart NameNode.
    4. In the dfs exclude file, specify the nodes using the full hostname or IP or IP:port format.
    5. Do the same in mapred.exclude
    6. execute bin/hadoop dfsadmin -refreshNodes. This forces the NameNode to reread the exclude file and start the decommissioning process.
    7. execute bin/hadoop mradmin -refreshNodes
    8. Monitor the NameNode and JobTracker web UI and confirm the decommission process is in progress. It can take a few seconds to update. Messages like "Decommission complete for node XXXX.XXXX.X.XX:XXXXX" will appear in the NameNode log files when it finishes decommissioning, at which point you can remove the nodes from the cluster.
    9. When the process has completed, the namenode UI will list the datanode as decommissioned. The Jobtracker page will show the updated number of active nodes. Run bin/hadoop dfsadmin -report to verify. Stop the datanode and tasktracker process on the excluded node(s).
    10. If you do not plan to reintroduce the machine to the cluster, remove it from the include and exclude files.

    To add a node as datanode and tasktracker see Hadoop FAQ page

    EDIT : When a live node is to be removed from the cluster, what happens to the Job ?

    The jobs running on a node to be de-commissioned would get affected as the tasks of the job scheduled on that node(s) would be marked as KILLED_UNCLEAN (for map and reduce tasks) or KILLED (for job setup and cleanup tasks). See line 4633 in JobTracker.java for details. The job will be informed to fail that task. Most of the time, Job tracker will reschedule execution. However, after many repeated failures it may instead decide to allow the entire job to fail or succeed. See line 2957 onwards in JobInProgress.java.

    这篇关于我如何正确删除Hadoop中的节点?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆