服务结构停用(暂停)还是停用(重新启动)? [英] Service Fabric Deactivate (pause) vs Deactivate (restart)?

查看:97
本文介绍了服务结构停用(暂停)还是停用(重新启动)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我登录Service Fabric Explorer并尝试禁用某个节点以进行OS升级时,会显示两个选项:

When I log in to Service Fabric Explorer and try to disable a node for an OS upgrade I am presented with two options:

  • 停用(暂停)
  • 停用(重新启动)

有人可以告诉我区别吗?

Can anyone tell me the difference?

推荐答案

Service Fabric具有可让您管理节点的API(在C#中,它们是DeactivateNodeAsync和ActivateNodeAsync,在PS中,它们是Enable/Disable-ServiceFabricNode).首先,其中大多数是人们管理自己的群集后的保留,与运行自己的群集相比,Azure托管服务结构群集环境中的使用应较少.无论哪种方式,当停用节点时,都有几种不同的选项,我们称之为 Intents .

Service Fabric has APIs that let you manage nodes (in C# these are DeactivateNodeAsync and ActivateNodeAsync, in PS they're Enable/Disable-ServiceFabricNode). First of all, most of these are holdovers from when people managed their own clusters, and should be less commonly used in the Azure Hosted Service Fabric Cluster environment compared to when you run your own clusters. Either way when deactivating a node there are several different options, which we call Intents.

您可以将它们视为代表节点上越来越严格的操作,您将在不同的情况下使用这些操作,并使用它们将与节点进行的操作传达给Service Fabric.

You can think of these as representing increasingly severe operations on the nodes, which you'd use under different situations, and you use them to communicate to Service Fabric what is being done to the node.

四个不同的选项是:

  1. 暂停-有效地暂停"节点:节点上的服务将继续运行,但是除非服务自行失败或移动服务,否则任何服务都不应移入或移出节点为了防止中断或不一致,必须将其连接到该节点.
  2. 重新启动-这会将所有内存中的有状态和无状态服务移出该节点,然后关闭(关闭)所有持久性服务(如果可以的话,请这样做)我们将制造备用零件.
  3. RemoveData -这将关闭该节点上的所有服务,如果出于安全考虑必须再次构建备用服务.用户有责任确保如果节点确实返回,则节点返回为空.
  4. RemoveNode -这将关闭节点上的所有服务,如果有必要,为了安全起见,会再次首先构建备件.在这种情况下,尽管您是专门告诉SF该节点不会再回来. SF执行另一项检查,以确保要删除的节点不是SeedNode(当前负责维护基础群集的节点之一).除此之外,这与RemoveData相同.
  1. Pause - effectively "pauses" the node: Services on it will continue to run, but no services should move in or out of the node unless they fail on their own, or unless moving a service to the node is necessary to prevent outage or inconsistency.
  2. Restart - this will move all of the in-memory stateful and stateless services off the node, and then shut down (close) any persistent services (if it is safe to do so, if not we'll build spares).
  3. RemoveData - this will close down all of the services on the node, again building spares first if it is necessary for safety. The user is responsible for ensuring that if the node does come back, it comes back empty.
  4. RemoveNode - this will close down all of the services on the node, again building spares first if necessary for safety. In this case though you're specifically telling SF that this node isn't coming back. SF performs an additional check to make sure that the node which is being removed isn't a SeedNode (one of the nodes currently responsible for maintaining the underlying cluster). Other than that, this is the same as RemoveData.

现在让我们讨论一下何时使用它们. 暂停是最常见的,如果您想调试给定的服务,进程,机器等,并且希望在查看时不进行更改(在可能的范围内).如果您去诊断服务的某些行为只是为了确定我们刚刚将其移交给您,那将有点尴尬.当出于某种原因要将所有工作负载移出节点时,将使用重新启动(这是我们看到的最常用的重新启动).例如,Service Fabric在升级节点上的Service Fabric位时会使用此方法本身-首先,我们通过意图重启来停用节点,然后在关闭并升级之前,等待节点完成操作(以便我们知道您的服务未运行).我们在该节点上的代码. RemoveData 是您知道节点已被调配并且不会回来的地方(例如,硬盘驱动器将被换出,或者硬件将被完全删除),或者您知道如果节点又回来了,它专门是空的(例如,您正在重新映像机器). Restart和RemoveData之间的区别在于,对于重新启动,我们知道该节点将返回,因此我们保留了该节点上副本的知识.对于持久性副本,这意味着我们不必立即再次构建副本.但是对于RemoveData,我们知道副本不会返回,因此需要在确认节点可以安全重启之前立即构建任何备件. RemoveNode 建立在RemoveData的基础上,它是一个附加指示符,表明您没有具体计划将该节点恢复原状.由于保持SeedNodes的启动很重要,因此,如果要删除的节点当前是Seed,则SF将使调用失败.如果您确实要删除该特定节点,则可以将群集重新配置为使用其他节点作为种子.何时使用RemoveData与RemoveNode的一个示例是,如果要缩减群集,则将显式调用RemoveNode,因为您希望节点不再出现并希望确保您使用重新拿走正确的集群,这样基础集群就不会崩溃.

Now let's talk about when you'd use each. Pause is most common if you want to debug a given service, process, machine etc, and would like it to not be changed (to the degree possible) while you are looking at it. It would be a little awkward if you went to go diagnose some behavior of a service only to determine that we had just moved it on you. Restart (which is the most common of these we see used) is used when for some reason you want to move all the workloads off the node. For example Service Fabric uses this itself when upgrading the Service Fabric bits on the node - first we deactivate the node with intent restart, and then we wait for that to complete (so we know your services are not running) before we shut down and upgrade our own code on that node. RemoveData is where you know the node is being deprovisioned and will not be coming back (say that the hard drives are going to be swapped out, or the hardware being completely removed), or you know that if the node is coming back it's specifically going to be empty (say you're reimaging the machine). The difference between Restart and RemoveData is that for restart, we know the node is coming back, so we keep the knowledge of the replicas on that node. For persistent replicas this means that we don't have to build the replicas again immediately. But for RemoveData we know that the replicas are not coming back, and so need to build any spares immediately before confirming that the node is safe to restart. RemoveNode builds on top of RemoveData, and is an additional indicator that you have no specific plans to bring this node back. Since it's important to keep the SeedNodes up, SF will fail the call if the node to be removed is currently a Seed. If you really want to remove that specific node, you can reconfigure the cluster to use a different node as a seed. An example of when you'd want to use RemoveData vs. RemoveNode is that if you're scaling down a cluster, you'd be explicitly calling RemoveNode, since you intent for the nodes not to come back and want to make sure you're taking the right ones away so the underlying cluster doesn't collapse.

完成操作(无论执行什么操作)并且您想要重新启用该节点后,相应的调用为激活/启用".重新启动节点不会导致其自动重新启用.因此,如果您已完成软件补丁程序(或任何使您使用Intent Restart的操作),并且希望将服务再次放置在节点上,则可以使用适当的节点名称调用Enable/Activate.

Once the operation (whatever it is) is done and you want to re-enable the node, the corresponding call is Activate/Enable. Restarting a node doesn't cause it to become automatically re-enabled. So if you are done with the software patch (or whatever caused you to use intent Restart, for example), and you want services to be placed on the node again, you would call Enable/Activate with the appropriate node Name.

作为停用/禁用调用的示例,请查看PS API文档

As an example of the deactivate/disable call, check out the PS API documentation here

这篇关于服务结构停用(暂停)还是停用(重新启动)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆