如何设计一个能够以动态方式高效管理配置的系统? [英] How to design a system that can manage configurations in a dynamic way efficiently?

查看:22
本文介绍了如何设计一个能够以动态方式高效管理配置的系统?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在设计一个系统,我需要在其中以动态方式跨应用服务器管理配置(配置文件).我正在使用 Consul 键值存储来管理配置.

为了管理配置,我在 consul kv 存储中创建了以下节点.

{"remoteConfig":"abc-123.tgz", "...."}

这里 remoteConfig 包含所有应用服务器将使用的配置文件(至少这是我得到的设计).

以下是我想要做的:

  • 所有应用服务器都在 Consul 中监视上述节点,一旦 remoteConfig 键的值发生变化,它们将收到通知,然后下载此配置并将其存储在磁盘上.
  • 现在,一旦集群中的所有应用服务器都下载了新配置,那么我们应该切换到在集群中的所有机器上使用内存中的新配置.如果很少有应用服务器下载失败,那么我们不应该在成功的剩余框中切换到使用最新配置.

我能够轻松完成第一点,但我对如何有效地设计第二点感到困惑,这可以帮助我仅在所有应用服务器都下载该特定配置时切换到最新配置.我确实知道如何通过在 Consul 中获取和释放锁来自动更新节点,但令人困惑的是如何有效地设计它以轻松处理这些情况.

问题:

  • 我应该如何以这种方式设计我的节点,以便我更容易看到所有机器都成功下载了这个特定的配置?现在是时候切换到所有盒子上的最新配置了.
  • 如果某些机器无法下载特定的配置,那么从阅读它应该清楚该应用服务器无法下载,也许它还可以显示时间戳,如该应用服务器在此时间戳下载此配置,然后他们切换到新配置在这个时间戳.

我不必为每台机器保留所有配置状态的历史记录,只要最新的就足够了.也欢迎在上述设计中进行任何其他改进,以动态方式管理配置.

(注意:我也可以有一堆其他节点(比如状态节点)来做这个练习,仅供参考.此外,我们可以使用 Zookeeper 代替 Consul,也可以使用 bcoz lock/leader 的东西这两种技术都可以完成,但现在我会坚持使用 Consul)

解决方案

我无法回答您的问题,但我担心如果您找到实现既定目标的方法,可能会出现潜在的竞争条件.

>

假设您有 5 台服务器,并且所有服务器都使用版本 1 的配置文件.然后要求服务器下载配置文件的第 2 版.当所有 5 个服务器都这样做时,您(以某种方式)向所有 5 个服务器发送一个信号,告诉它们从配置文件的版本 1 切换到版本 2.这是竞争条件可能发生的地方.不能保证从配置文件的版本 1 切换到版本 2 会在 5 个服务器中的每一个的同一时间点发生.因此,在很短的一段时间内(可能只有几毫秒),一些服务器仍将使用版本 1,而其他服务器将使用版本 2.在这段短暂的时间内,您的服务器上的配置将不一致.

如果这种短暂的不一致会给您带来问题,那么我认为您将需要不同的从配置的版本 1 切换到版本 2";机制,本质上归结为:(1)要求所有服务器进程终止;(2) 等待它们全部终止,以及 (3) 使用配置的版本 2 重新启动它们.显然,这种方法需要一段很短的时间,在此期间服务器不运行,这并不理想,但至少避免了竞争条件.

I am working on designing a system where I need to manage configuration (config files) in a dynamic way across bunch of application servers. I am working with Consul key value store to manage configurations.

I created below node in consul kv store for the purpose of managing configurations.

{"remoteConfig":"abc-123.tgz", "...."}

Here remoteConfig contains the config file that all the app servers will use (atleast this is the design I got).

Below is what I am trying to do:

  • All the app servers keep a watch on above node in Consul and as soon as value of remoteConfig key changes, they will be notified and then they will download this config and store it on disk.
  • Now once all the app servers in the cluster have downloaded the new config then only we should switch to use new configs in memory across all the boxes in the cluster. If few app servers failed to download then we should not switch to use latest configs in remaining boxes where it was successful.

I am able to do first point easily but I am confuse on how to design my second point efficiently which can help me to switch to latest configs only when all the app servers have downloaded that particular config. I do know on how to atomically update a node by acquiring and releasing lock in Consul but confusion is on how to design it efficiently to handle these cases easily.

Question:

  • How should I design my node in such a way so that it is easier for me to see that all the machines have download this particular config successfully? And it is time now to switch to latest configs on all the boxes.
  • If some machines failed to download a particular config then it should be clear from reading it that this app server failed to download and maybe it can also show timestamp like this app server downloaded this config at this timestamp and they switched to new config at this timestamp.

I don’t have to keep history for all the configs status for each machine, just the latest one will be sufficient. Any other improvements are also welcome in above design to manage the configuration in a dynamic way.

(Note: I can have bunch of other nodes as well (like status node) to do this exercise just fyi. Also instead of Consul, we can use Zookeeper also bcoz lock/leader stuff can be done in both the technologies but for now I am gonna stick to Consul)

解决方案

I can't answer your question, but I am concerned about a potential race condition that might occur if you find a way to achieve your stated goal.

Let's assume you have 5 servers and all are using version 1 of the configuration files. Then the servers are asked to download version 2 of the configuration files. When all 5 servers have done that, you (somehow) send a signal to all 5 servers to tell them to switch from version 1 to version 2 of the configuration files. This is where the race condition can occur. Switching from version 1 to version 2 of the configuration file is not guaranteed to occur at the same point in time in each of the 5 servers. Thus, for a brief period of time (perhaps just a few milliseconds) some servers will still be using version 1 while other servers will be using version 2. During that brief period of time, you will have inconsistent configuration on your servers.

If that brief inconsistency can cause problems for you, then I think you will need a different "switch from version 1 to version 2 of configuration" mechanism, which in essence boils down to: (1) ask all the server processes to terminate; (2) wait for all of them to terminate, and (3) restart them with version 2 of the configuration. Obviously, this approach necessitates a brief period during which servers are not running, which is not ideal, but at least it avoids the race condition.

这篇关于如何设计一个能够以动态方式高效管理配置的系统?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆