为什么我们需要Hadoop堆栈中的ZooKeeper? [英] Why do we need ZooKeeper in the Hadoop stack?

查看:305
本文介绍了为什么我们需要Hadoop堆栈中的ZooKeeper?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Hadoop / ZooKeeper的新手。我无法理解在Hadoop中使用ZooKeeper的目的,是ZooKeeper在Hadoop中写入数据的目的?如果没有,那么为什么我们使用ZooKeeper和Hadoop?

解决方案

Hadoop 1.x不使用Zookeeper。 HBase甚至在Hadoop 1.x安装中也使用zookeeper。


$ b Hadoop从2.0版开始就采用了Zookeeper。



< Zookeeper的目的是集群管理。这符合使用较小专用组件的* nix的一般理念 - 所以Hadoop的组件需要集群功能依赖于Zookeeper,而不是自行开发。


Zookeeper是一个分布式存储,提供以下保证(从 Zookeeper概述页复制):


  • 顺序一致性 - 来自客户端的更新将应用于它们发送的
    顺序。

  • 原子性 - 更新成功或
    失败。没有部分结果。

  • 单一系统映像 - 无论连接
    的服务器如何,客户端都会看到服务的
    相同视图。

  • 可靠性 - 一旦应用了更新,它将持续
    ,直到客户覆盖更新为止。

  • 及时性 - 在特定的时间范围内,系统的客户视图保证为



您可以使用这些来实现不同的食谱,这是集群管理所需的,如锁,领导者选举等。

如果你打算自己使用ZooKeeper,我建议你看看<一个href =https://github.com/Netflix/curator/wiki =noreferrer> Netflix的馆长,这使得它更易于使用(例如,他们实现了一些开箱即用的配方) / p>

I am new to Hadoop/ZooKeeper. I cannot understand the purpose of using ZooKeeper with Hadoop, is ZooKeeper writing data in Hadoop? If not, then why we do we use ZooKeeper with Hadoop?

解决方案

Hadoop 1.x does not use Zookeeper. HBase does use zookeeper even in Hadoop 1.x installations.

Hadoop adopted Zookeeper as well starting with version 2.0.

The purpose of Zookeeper is cluster management. This fits with the general philosophy of *nix of using smaller specialized components - so components of Hadoop that want clustering capabilities rely on Zookeeper for that rather than develop their own.

Zookeeper is a distributed storage that provides the following guarantees (copied from Zookeeper overview page):

  • Sequential Consistency - Updates from a client will be applied in the order that they were sent.
  • Atomicity - Updates either succeed or fail. No partial results.
  • Single System Image - A client will see the same view of the service regardless of the server that it connects to.
  • Reliability - Once an update has been applied, it will persist from that time forward until a client overwrites the update.
  • Timeliness - The clients view of the system is guaranteed to be up-to-date within a certain time bound.

You can use these to implement different "recipes" that are required for cluster management like locks, leader election etc.

If you're going to use ZooKeeper yourself, I recommend you take a look at Curator from Netflix which makes it easier to use (e.g. they implement a few recipes out of the box)

这篇关于为什么我们需要Hadoop堆栈中的ZooKeeper?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆