Hadoop Namenode故障转移过程如何工作? [英] How does Hadoop Namenode failover process works?

查看:194
本文介绍了Hadoop Namenode故障转移过程如何工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Hadoop定义指南说 -
$ b


每个Namenode运行轻量级故障转移控制器进程,其
作业是监视其Namenode失败(使用简单的
心跳机制),并在namenode
失败时触发故障转移。


blockquote>

一个namenode怎么能运行某些东西来检测它自己的失败?



谁向谁发送心跳?



这个过程在哪里运行?



它如何检测namenode失败?

org / docs / current / hadoop-project-dist / hadoop-hdfs / HDFSHighAvailabilityWithQJM.html #Automatic_Failoverrel =noreferrer> Apache文档



ZKFailoverController (ZKFC)是一个新组件,它是一个 ZooKeeper 客户端,它也监视和管理是NameNode的状态。运行 NameNode 的每台机器也运行 ZKFC ZKFC 负责:



运行状况监控 - ZKFC 通过运行状况检查命令定期对其本地 NameNode 进行ping。只要NameNode以及时的状态响应,ZKFC就认为节点健康。如果节点崩溃,冻结或以其他方式进入不健康状态,则健康监视器会将其标记为不健康。

ZooKeeper会话管理 - 当本地NameNode健康时,ZKFC 在ZooKeeper中保持会话打开状态。如果本地NameNode处于活动状态,则它还包含特殊的锁定znode。此锁使用ZooKeeper对短暂节点的支持;如果会话过期,锁定节点将被自动删除。



基于ZooKeeper的选举 - 如果本地 NameNode 是健康的,而ZKFC 看到其他节点当前没有锁znode,它本身会尝试获取该锁。如果成功,那么它已经赢得了选举,并负责运行故障转移以使其本地 NameNode 处于活动状态。



查看:



总结: 名称节点是守护进程&故障转移控制器是一个守护进程。如果名称节点守护进程失败,故障转移控制器守护进程将检测并采取纠正措施。即使整台计算机崩溃,ZooKeeper服务器都会检测到它,并且锁定将过期,其他备用名称节点将被选为活动名称节点。


Hadoop defintive guide says -

Each Namenode runs a lightweight failover controller process whose job it is to monitor its Namenode for failures (using a simple heartbeat mechanism) and trigger a failover should a namenode fail.

How come a namenode can run something to detect its own failure?

Who sends heartbeat to whom?

Where this process runs?

How it detects namenode failure?

To whom it notify for the transition?

解决方案

From Apache docs

The ZKFailoverController (ZKFC) is a new component which is a ZooKeeper client which also monitors and manages the state of the NameNode. Each of the machines which runs a NameNode also runs a ZKFC, and that ZKFC is responsible for:

Health monitoring - the ZKFC pings its local NameNode on a periodic basis with a health-check command. So long as the NameNode responds in a timely fashion with a healthy status, the ZKFC considers the node healthy. If the node has crashed, frozen, or otherwise entered an unhealthy state, the health monitor will mark it as unhealthy.

ZooKeeper session management - when the local NameNode is healthy, the ZKFC holds a session open in ZooKeeper. If the local NameNode is active, it also holds a special "lock" znode. This lock uses ZooKeeper's support for "ephemeral" nodes; if the session expires, the lock node will be automatically deleted.

ZooKeeper-based election - if the local NameNode is healthy, and the ZKFC sees that no other node currently holds the lock znode, it will itself try to acquire the lock. If it succeeds, then it has "won the election", and is responsible for running a failover to make its local NameNode active.

Have a look at this Apache PDF which is part of HDFS-2185 JIRA issue

Slide 16 from

http://www.slideshare.net/cloudera/hdfs-update-lipcon-federal-big-data-apache-hadoop-forum

:

Automatic Namenode failover process in Hadoop:

In a typical HA cluster, two separate machines are configured as NameNodes. At any point in time, exactly one of the NameNodes is in an Active state, and the other is in a Standby state. The Active NameNode is responsible for all client operations in the cluster, while the Standby is simply acting as a slave, maintaining enough state to provide a fast failover if necessary.

In order for the Standby Namenode to keep its state synchronized with the Active Namenode, both nodes communicate with a group of separate daemons called JournalNodes (JNs).

When any namespace modification is performed by the Active node, it durably logs a record of the modification to a majority of these JNs. The Standby node is reads these edits from the JNs and apply to its own name space.

In the event of a failover, the Standby will ensure that it has read all of the edits from the JounalNodes before promoting itself to the Active state. This ensures that the namespace state is fully synchronized before a failover occurs.

It is vital for an HA cluster that only one of the NameNodes is Active at a time. ZooKeeper has been used to avoid split brain scenario so that name node state is not getting diverged due to failover.

Slide 8 from : http://www.slideshare.net/cloudera/hdfs-futures-world2012-widescreen

:

In Summary: Name Node is Daemon & Failover controller is a Daemon. If Name Node Daemon fails, Failover controller Daemon detects and takes corrective action. Even if entire machine crashes, ZooKeeper server detects it and lock will be expired and other Standby name node will be elected as Active Name node.

这篇关于Hadoop Namenode故障转移过程如何工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆