解释 Apache ZooKeeper [英] Explaining Apache ZooKeeper

查看:33
本文介绍了解释 Apache ZooKeeper的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试了解 ZooKeeper,它是如何工作的以及它的作用是什么.是否有任何应用程序可以与 ZooKeeper 相媲美?

I am trying to understand ZooKeeper, how it works and what it does. Is there any application which is comparable to ZooKeeper?

如果您知道,那么您会如何向外行描述 ZooKeeper?

If you know, then how would you describe ZooKeeper to a layman?

我已经尝试过 apache wiki、zookeeper sourceforge ......但我仍然无法与它相关.

I have tried apache wiki, zookeeper sourceforge...but I am still not able to relate to it.

我只是通读了http://zookeeper.sourceforge.net/index.sf.shtml,所以不是还有更多这样的服务吗?是不是只是复制一个服务器服务那么简单?

I just read thru http://zookeeper.sourceforge.net/index.sf.shtml, so aren't there more services like this? Is it as simple as just replicating a server service?

推荐答案

简而言之,ZooKeeper 可帮助您构建分布式应用程序.

In a nutshell, ZooKeeper helps you build distributed applications.

您可以将 ZooKeeper 描述为具有最终一致性的复制同步服务.它是健壮的,因为持久化数据分布在多个节点之间(这组节点称为集合")并且一个客户端连接到其中任何一个(即特定的服务器"),如果一个节点出现故障则迁移;只要严格的大多数节点都在工作,ZooKeeper 节点的集合就处于活动状态.特别是,主节点是通过集成内部的共识动态选择的;如果 master 节点发生故障,master 的角色将迁移到另一个节点.

You may describe ZooKeeper as a replicated synchronization service with eventual consistency. It is robust, since the persisted data is distributed between multiple nodes (this set of nodes is called an "ensemble") and one client connects to any of them (i.e., a specific "server"), migrating if one node fails; as long as a strict majority of nodes are working, the ensemble of ZooKeeper nodes is alive. In particular, a master node is dynamically chosen by consensus within the ensemble; if the master node fails, the role of master migrates to another node.

master 是写的权威:这样可以保证写是按顺序持久化的,即写是线性.每次客户端写入集成时,大多数节点都会保存信息:这些节点包括客户端的服务器,显然是主节点.这意味着每次写入都会使服务器与主服务器保持同步.但是,这也意味着您不能进行并发写入.

The master is the authority for writes: in this way writes can be guaranteed to be persisted in-order, i.e., writes are linear. Each time a client writes to the ensemble, a majority of nodes persist the information: these nodes include the server for the client, and obviously the master. This means that each write makes the server up-to-date with the master. It also means, however, that you cannot have concurrent writes.

线性写入的保证是 ZooKeeper 在写入主导的工作负载中表现不佳的原因.尤其不能用于媒体等大数据的交换.只要您的通信涉及共享数据,ZooKeeper 都会帮助您.当数据可以并发写入时,ZooKeeper 实际上会造成阻碍,因为它强加了严格的操作顺序,即使从作者的角度来看并不是绝对必要的.它的理想用途是用于协调,即在客户端之间交换消息.

The guarantee of linear writes is the reason for the fact that ZooKeeper does not perform well for write-dominant workloads. In particular, it should not be used for interchange of large data, such as media. As long as your communication involves shared data, ZooKeeper helps you. When data could be written concurrently, ZooKeeper actually gets in the way, because it imposes a strict ordering of operations even if not strictly necessary from the perspective of the writers. Its ideal use is for coordination, where messages are exchanged between the clients.

这就是 ZooKeeper 的优势所在:读取是并发的,因为它们由客户端连接的特定服务器提供服务.然而,这也是最终一致性的原因:客户端的视图"可能已经过时,因为主节点以有限但未定义的延迟更新相应的服务器.

This is where ZooKeeper excels: reads are concurrent since they are served by the specific server that the client connects to. However, this is also the reason for the eventual consistency: the "view" of a client may be outdated, since the master updates the corresponding server with a bounded but undefined delay.

ZooKeeper 的复制数据库包含一棵 znodes 树,它们是大致代表文件系统节点的实体(将它们视为目录).每个 znode 都可以通过一个字节数组来丰富,该数组存储数据.此外,每个 znode 下可能还有其他 znode,实际上形成了一个内部目录系统.

The replicated database of ZooKeeper comprises a tree of znodes, which are entities roughly representing file system nodes (think of them as directories). Each znode may be enriched by a byte array, which stores data. Also, each znode may have other znodes under it, practically forming an internal directory system.

有趣的是,一个 znode 的名字可以是 sequential,这意味着客户端在创建 znode 时提供的名字只是一个前缀:全名也是由合奏.这很有用,例如,用于同步目的:如果多个客户端想要获得一个资源的锁,他们可以每个人同时在一个位置上创建一个顺序 znode:获得最小数字的人有权获得锁.

Interestingly, the name of a znode can be sequential, meaning that the name the client provides when creating the znode is only a prefix: the full name is also given by a sequential number chosen by the ensemble. This is useful, for example, for synchronization purposes: if multiple clients want to get a lock on a resource, they can each concurrently create a sequential znode on a location: whoever gets the lowest number is entitled to the lock.

此外,znode 可能是短暂的:这意味着一旦创建它的客户端断开连接,它就会被销毁.这主要用于了解客户端何时失败,这在客户端本身具有应由新客户端承担的责任时可能是相关的.以锁为例,一旦拥有锁的客户端断开连接,其他客户端就可以检查自己是否有权获得锁.

Also, a znode may be ephemeral: this means that it is destroyed as soon as the client that created it disconnects. This is mainly useful in order to know when a client fails, which may be relevant when the client itself has responsibilities that should be taken by a new client. Taking the example of the lock, as soon as the client having the lock disconnects, the other clients can check whether they are entitled to the lock.

如果我们需要定期轮询 znodes 的状态,那么与客户端断开相关的示例可能会出现问题.幸运的是,ZooKeeper 提供了一个事件系统,可以在 znode 上设置 watch.如果 znode 被特别更改或删除,或者在其下创建了新的子节点,则可以将这些监视设置为触发事件.这与 znode 的顺序和临时选项结合使用显然非常有用.

The example related to client disconnection may be problematic if we needed to periodically poll the state of znodes. Fortunately, ZooKeeper offers an event system where a watch can be set on a znode. These watches may be set to trigger an event if the znode is specifically changed or removed or new children are created under it. This is clearly useful in combination with the sequential and ephemeral options for znodes.

Zookeeper 使用的一个典型示例是分布式内存计算,其中一些数据在客户端节点之间共享,并且必须以非常谨慎的方式访问/更新以解决同步问题.

A canonical example of Zookeeper usage is distributed-memory computation, where some data is shared between client nodes and must be accessed/updated in a very careful way to account for synchronization.

ZooKeeper 提供了构建同步原语的库,同时运行分布式服务器的能力避免了使用集中式(类似代理)消息存储库时的单点故障问题.

ZooKeeper offers the library to construct your synchronization primitives, while the ability to run a distributed server avoids the single-point-of-failure issue you have when using a centralized (broker-like) message repository.

ZooKeeper 是轻功能的,这意味着领导者选举、锁、屏障等机制尚不存在,但可以写在 ZooKeeper 原语之上.如果 C/Java API 对您的目的来说太笨拙,您应该依赖基于 ZooKeeper 的库,例如 cages 尤其是 curator.

ZooKeeper is feature-light, meaning that mechanisms such as leader election, locks, barriers, etc. are not already present, but can be written above the ZooKeeper primitives. If the C/Java API is too unwieldy for your purposes, you should rely on libraries built on ZooKeeper such as cages and especially curator.

除了官方文档,还不错,建议阅读Hadoop: The Definitive Guide 大约有 35 页,主要解释 ZooKeeper 的作用,然后是配置服务的示例.

Official documentation apart, which is pretty good, I suggest to read Chapter 14 of Hadoop: The Definitive Guide which has ~35 pages explaining essentially what ZooKeeper does, followed by an example of a configuration service.

这篇关于解释 Apache ZooKeeper的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆