一种分布式算法,用于将共享资源分配给多个节点中的一个节点 [英] A distributed algorithm to assign a shared resource to one node out of many

查看:455
本文介绍了一种分布式算法,用于将共享资源分配给多个节点中的一个节点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试实现多播通信以分配一些资源.我为此使用 jGroups ,所以我拥有可靠的多播和FIFO排序.通过这样做,我想认识到使用分布式解决方案,这意味着没有充当协调器的主节点.

I try to implement a multicast communication to distribute some resources. I’m using jGroups for that, so I have reliable multicast and FIFO-Ordering. By doing that I wanna realise that with a distributed solution, that means without an master node that acts as a coordinator.

每个节点都可以启动分发,因此有可能两个或多个节点同时开始分发.当节点收到分配消息时,它将回答此问题.应答消息和启动器发出的消息之间没有区别.它仅包含有关资源名称(例如resourceA)以及该节点是否能够处理的信息. 当Member1开始分发时,它将发送如下消息:

Every node is able to start a distribution, so it is possible that two or more nodes are starting the distribution at the same time. When a node receives a distribution-message it will answer this. There are no differences between an answer-message and a message from a starter. It only contains information about the resource-name (e.g. resourceA) and if that node is able to handle it. When Member1 starting the distribution it will send a message like:

Member1, resourceA, OK

成员2没有该资源的空间,并发送类似以下内容的答案消息:

Member2 has no space for that resource and send a answer-message like:

Member2, resourceA, NOT_OK

在这种情况下很容易,因为现在Member1知道他可以使用resourceA.当一个以上的节点能够处理资源时,其他属性将决定谁来使用资源(例如,具有最高ID的成员).

In this case it is easy, because now Member1 knows that he can takes resourceA. When more than one node are able to handle the resource other properties will decide who takes the resource (e.g. the member with the highest ID).

我的问题是:当两个或多个节点同时启动有关同一主题(resourceA)的分发时,如何处理?

My problem is: How ta handle it when two or more nodes are starting the distribution about the same topic (resourceA) at the same time?

有人用这种方法看到一些问题吗? Member1和Member2正在同时开始分发.在这一点上,双方都期待彼此的回应.由于在响应消息或启动程序消息中没有区别,所以两者都认为自己刚刚收到的消息是应答者.因此,Member1将启动程序消息发送到多播组中,而Member2将启动程序消息发送到mutlicast组中(在从Member1接收消息之前).现在,Member1正在从Member2接收启动程序消息,并认为这是响应.

Does anybody see some problems doing it this way: Member1 and Member2 are starting the distribution at the same time. At this point both are expecting a response from each other. Because of the fact that there are no differences in a response-message or in a starter-message both are thinking the message they just received is an answerer. So Member1 sends a starter-message into the multicast-group and Member2 sends a starter-message into the mutlicast-group (before receiving the message from Member1). Now Member1 is receiving the starter-message from Member2 and thinks that this is the response.

通过保证每个节点每个主题仅发送一条消息(作为启动程序或带有响应),我想说,即使有两个以上的节点,这样做也没有问题.

By guarantee that every node sends only one message per topic (as a starter or with a response) I would say that there are no problems doing it this way even when there are more than two nodes.

推荐答案

根据您的描述,可以得出以下结论:

From your description, the following conclusions can be drawn:

  • 假定所有成员都从一开始就在运行,一旦该系统运行,就不会添加任何新成员,并且也不会删除任何成员
  • 所有成员都知道系统中的成员总数

如果这些结论之一是错误的(或两者都不正确),那么我看不到您的算法如何工作,因为无法知道何时所有成员都响应了启动程序消息并得出哪个成员的ID最高.

If one of these conclusions is incorrect (or both), then I do not see how your algorithm works, because there is no way to know when all members have responded to a starter message and conclude which member has the highest ID.

如果两个结论都正确,那么我认为算法的功能没有问题,您的方法似乎可行.但是,对于失败或不响应的成员,生成的系统将易于出错.如果一个成员不响应启动消息,那么您可能最终无法确定谁将使用该资源,因为它可能是那个非响应成员.

If both conclusions are correct, then I do not see a problem with the functionality of the algorithm and your approach seems to work. However, the resulting system will be error prone with regard to a failing or non-responding member. If one member does not respond to a starter message, then you could end up with the situation where it is impossible to decide who will take the resource, because it might or might not be that non-responding member.

不幸的是,尽管您没有提供有关正常运行时间要求的任何信息,但其中一个成员 很可能在某个时候不响应.为了避免仅由于一个成员没有响应而导致算法完全崩溃,您将必须在算法中设计预防措施,例如,通过添加超时,如果该成员没有响应,则将其从已知成员列表"中删除及时.

Unfortunately, it is very likely that one of the members will not respond at some point -- although you did not give any information about uptime requirements. To avoid a total breakdown of the algorithm just because one member is not responding, you will have to design precautions into the algorithm, for example by adding a time-out and remove a member from the "known members list" if it does not respond in time.

但是,即使具有这种内置的容错能力,您也应该认识到,按照定义,没有某种协调主服务器的完全分布式解决方案仍然会遇到难以解决的情况.例如,在分布式环境中,网络问题可能会导致网络的一半看不到另一半的情况.由于没有协调大师来得出任何最终结论,因此网络的两半都认为他们了解世界",并将继续做自己的事情.为了做出解决方案的决策,您必须更加清楚自己的要求,并更好地描述可能的故障情况...

But even with such built-in fault tolerance, you should realize that a completely distributed solution without some kind of coordinating master will, by definition, have situations that are difficult to deal with. For example, in a distributed environment, a network problem could lead to a situation where one half of the network does not see the other half. Since there is no coordinating master to draw any final conclusions, both halves of the network think "they know the world" and will continue to do their thing. In order to make decisions about how to resolve that, you will have to be more clear about your requirements and give a better picture of possible fault situations...

这篇关于一种分布式算法,用于将共享资源分配给多个节点中的一个节点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆