只使用原子读取和数目不详的流程和强大的处理堕胎写互斥算法 [英] Mutual exclusion algorithm using only atomic reads and writes for unknown number of processes and robust to process abortion

查看:342
本文介绍了只使用原子读取和数目不详的流程和强大的处理堕胎写互斥算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想拿出一个互斥算法只基于原子的读取和共享内存写入原子(即没有比较和掉期或类似)。

I am trying to come up with a mutual exclusion algorithm that is based only on atomic reads and atomic writes of shared memory (i.e. no compare-and-swap or similar).

除了相互排斥和死锁自由,它需要满足以下性质:

Apart from mutual exclusion and deadlock freedom, it needs to fulfill the following properties:

  • 它需要在过程中死在面前的任何
  • 强劲
  • 这需要处理事前数目不详的进程竞争锁
  • 每隔几秒钟,我需要的进程进入临界区中的一个(我不关心它)
  • 这应该是内存效率地
  • 在我所有的进程共享一个合理的同步时钟源(亚秒级精度)
  • It needs to be robust in the face of processes dying at any point
  • It needs to handle an ex-ante unknown number of processes competing for the lock
  • Every few seconds I need one of the processes to enter the critical section (I don't care which)
  • It should be as memory-efficient as possible
  • All my processes share a reasonably synchronized clock-source (sub-second accuracy)

我想出了一个办法,看起来对我来说,它可以工作,但我想通过你们来运行它,看看我的地方犯了一个错误,或者如果有一个更优雅的解决方案。

I have come up with a way that looks to me like it could work, but I wanted to run it by you guys to see if I made a mistake somewhere or if there is a more elegant solution.

我们的想法是 Szymanski的的三位的修改后的版本结合线性等待算法(图1的互斥再探第4页)基于松散的过程命名-方案的莫伊尔和安德森的无等待算法快,寿命长重命名的(M&安培; A)。

The idea is to combine a modified version of Szymanski's Three-Bit Linear Wait Algorithm (Figure 1 on page 4 of "Mutual Exclusion Revisited") with a process-naming-scheme based loosely on Moir and Anderson's "Wait-Free Algorithms for Fast, Long-Lived Renaming" (M&A).

作为第一步,我被分配给它们就像一个ID定义一个秩序传入进程的 M&放大器; A 的:

As a first step, I define an "order" for incoming processes by assigning them an ID like M&A:

M&放大器; A 的使用下面的互斥原语(见图2第5页):

哪里出的 N 输入模块处理,在大多数 1 将停在那里,最多的 N'thinsp; -   1 动右的,并且在大多数 N'thinsp; -   1 下移

M&A use the following mutex primitive (see Figure 2 on page 5):
      
where out of n processes entering the block, at most 1 will stop there, at most n - 1 will move right, and at most n - 1 will move down.

这些原语可以连接在一起成一个网格,在那里我决定从 M&放不同编号的箱; A 的,让我来计算箱号不知道箱子总数:

盒子数可以使用箱=计算(M *(M-1))/ 2 + R ,其中<强> M 是总数量移动发(包括移动到左上角的框,即 M &thinsp;&GE;&thinsp 1)和研究是移动到右侧的数字(<强>研究&thinsp;&GE;&thinsp; 0)

These primitives can be linked together into a grid, where I decided to number the boxes differently from M&A, to allow me to calculate the box number without knowing the total number of boxes:
      
The box number can be calculated using box = (m*(m-1))/2 + r, where m is the number of total moves made (including the move into the top-left box, i.e. m ≥ 1) and r is the number of moves to the right (r ≥ 0).

任何新的进程将首先被分配一个大的,全球唯一的ID(如时间戳+庞大的随机数),它将作为价值互斥原始算法中使用的 P 。它然后将启动遍历网格以下的原语的规则,开始在左上角,直到它在某些框停止。在其中它停止现在将其新的进程ID的框的数目

Any new process will initially be assigned a large, globally unique ID (e.g. timestamp + huge random number), which it will use as the value for p in the mutex primitive algorithm above. It will then start traversing the grid following the rules of the primitive, starting in the top-left corner, until it stops in some box. The number of the box in which it stopped will now be its new process ID.

要保持进程ID小的数目(进程可能消失;还有:箱子可以锁定永远不会被使用,如果全部进入流程移动或者向右或向下,并没有停在那里),我将代替布尔变量<强>是在原始以上时间戳。

To keep the number of process IDs smaller (processes might disappear; and also: boxes can be locked forever without being used if all entering processes move either right or down and none stop there), I will replace the boolean variable Y in the primitive above with a time-stamp.

Y:= TRUE 将被替换为 Y:=现在,其中现在是当前时间。同样,条件如果Y然后... 将被替换如果(现在 - Y'LT;超时),那么... ,其中超时是在这之后,框将被返回到可用池的时间。

The line Y := true will then be replaced with Y := now, where now is the current time. Similarly, the condition if Y then ... will be replaced with if (now - Y < timeout) then ..., where timeout is the time after which a box will be returned to the available pool.

虽然进程仍在运行,它需要保持在 - 值的框设置为现在以绝不允许其ID到期。虽然超时值越小将导致更小的ID和更少的内存使用,它理论上可以任意选择大,因此应该总是能够找到一个值超时的将保证过程永远不会失去自己的ID,除非他们真的退出或死亡。的几分钟,几小时,甚至几天的时间值应该做的伎俩。

While a process is still running, it needs to keep setting the Y-value of its box to now to never allow its ID to expire. While smaller values of timeout will lead to smaller IDs and less memory use, it can theoretically be chosen arbitrarily large and thus it should always be possible to find a value for timeout that will guarantee that processes will never lose their ID unless they actually quit or died. Values of minutes, hours, or even days should do the trick.

现在,我们可以利用这个过程中为了实现Szymanski算法:

communication variables:: a, w, s: boolean = false
private variables:: j: 0..n

p1      ai=true;
p2      for(j=0;j<n;j++)
            while(sj);
p3      wi=true;
        ai=false;
p4      while(!si) {
p5          for (j=0;j<n & !aj;j++);
p6          if (j==n) {
                si=true;
p6.1            for (j=0;j<n & !aj;j++);
p6.2            if (j<n)
                    si=false;
p6.3            else {
                    wi=false;
p6.4                for(j=0;j<n;j++)
                        while(wj);
                }
            }
p7          if (j<n)
                for (j=0;j<n & (wj | !sj);j++);
p8          if (j!=i & j<n) {
p8.1            si=true;
                wi=false;
            }
        }
p9      for(j=0;j<i;j++)
            while(wj | sj);

Critical Section

e1      si=false;

这个算法背后的想法是有点参与和简洁(如果不是已经是一个失败的事业在这一点),我不会尝试把它套用在这里再次(的维基百科还讨论了该算法的变体)。

要根据需要进行这个算法的工作(即流程的脸中止和程序未知总数)以下的更改可以:

To make this algorithm work as needed (i.e. in the face of process aborts and an unknown total number of processes) the following changes can be made:

就像部分,布尔值 <子>我 是W <子>我 ,和取值<子>我 将被替换时间戳。除了,这一次超时需要足够短,仍允许临界区将根据需要经常输入的,在我的情况下,每隔几秒钟,所以5秒的超时可能会奏效。在同一时间,超时需要足够长,以便它永不过期除非一个进程死亡,即它需要长于它过程来不断更新的时间戳为需要保持置标志的最坏情况下的时间。

Just like Y above, the booleans ai, wi, and si will be replaced with timestamps. Except, this time the timeouts need to be short enough to still allow the critical section to be entered as often as needed, in my case every few seconds, so a timeout of 5s might work. At the same time, the timeout needs to be long enough so that it never expires unless a process dies, i.e. it needs to be longer than the worst-case time it takes processes to keep updating the timestamps for flags that need to remain set.

在流程的循环(如为(J = 0; J&n种; J ++))将变成进程ID电网遍历从上面的为了在盒编号。每个箱子可以有3种状态:从未涉足过任何人,这个过程已经停止,或者所有的进程前进,盒子目前被阻塞。不区分需要在后两者之间进行,因为阻止框将简单地对应于当前不是要进入临界区的方法(即 <子>我 是W <子>我 取值<子>我 /过期)。该循环开始于框0。如果遇到已访问过一个盒子(即有 X - 值集),它增加了相邻的盒子其以检查清单(即:如果箱8被看见,箱12和13将被添加到列表中)。循环结束时,以检查清单已处理完毕(当然,除非,另外还有一个终止条件present将触发第一个(如 J&LT;我&安培;!AJ ))

The loops over processes (e.g. for(j=0;j<n;j++)) will be turned into traversals of the process-ID grid from above in the order of the box-numbers. Each box can have 3 states: Never visited by anyone, a process has stopped in it, or all processes moved on and the box is currently blocked. No distinction needs to be made between the latter two, since a blocked box will simply correspond to a process that is currently not trying to enter the critical section (i.e. ai, wi, and si are false/expired). The loop starts in box 0. If it comes across a box that has been visited (i.e. has its X-value set), it adds the adjacent boxes to its "to-check-list" (i.e. if box 8 was seen, box 12 and 13 will be added to the list). The loop terminates when the "to-check-list" has been processed completely (unless, of course, there is another stopping condition present that fires first (e.g. j < i or & !aj)).

现在我的问题:

  1. 这看起来像它会可靠地工作?
    我从来没有做过这样的框架的正式证明。我的直觉(和激烈琢磨)告诉我,这应该是坚实的,但我不会介意的东西有点比这更严格。我最担心的一个过程P3和Szymanski算法的E1和标志到期随机之间奄奄一息,而其他进程都在的code这部分的影响。
  2. 有没有更好的解决方案(简单/更少的资源利用)?
    一个攻击点进行优化的可能是事实,我不需要所有工艺,最终使它的关键部分,我只需要一个(我不在乎哪一个其一)。
  1. Does this look like it would work reliably?
    I've never done a formal proof of a framework like this. My gut (and intense pondering) tell me that it should be solid, but I wouldn't mind something a bit more rigorous than that. I'm most worried about the effects of a process dying between p3 and e1 of Szymanski's algorithm and the flags expiring randomly while other processes are in that part of the code.
  2. Is there a better solution (simpler/less resource use)?
    One point of attack for an optimization might be the fact that I don't need all processes to eventually make it to the critical section, I just need one (and I don't care which one either).

对不起,这篇文章的长度,并感谢你的耐力读它!建议缩短欢迎...:)

Sorry for the length of this post and thank you for your endurance reading it! Suggestions for shortening welcome... :)

PS:我也贴在<一个这个问题href="http://cs.stackexchange.com/questions/48466/mutual-exclusion-algorithm-using-only-atomic-reads-and-writes-for-unknown-number">CS堆叠交换。

推荐答案

因为我真的不关心公平性或饥饿(无所谓哪个进程进入临界区),我并不真的需要使用算法的复杂Szymanski的公司。

Since I don't really care about fairness or starvation (it doesn't matter which process enters the critical section), I don't really need to use an algorithm as complex as Szymanski's.

我发现了一个很漂亮的替代方案:一种算法​​伯恩斯与林奇:

I found a quite beautiful alternative: An algorithm by Burns and Lynch:

program Process_i;
    type flag = (down, up);
    shared var F : array [1..N] of flag;
    var j : 1..N;
begin
    while true do begin
        1: F[i] := down;
        2: remainder;     (* remainder region *)
        3: F[i] := down;
        4: for j := 1 to i-1 do
               if F[j] = up then goto 3;
        5: F[i] := up;
        6: for j := 1 to i-1 do
               if F[j] = up then goto 3;
        7: for j := i+1 to N do
               if F[j] = up then goto 7;
        8: critical;      (* critical region *)
    end
end.

您可以找到他们的论文第4页(836)上互斥使用不可分割的读取和写入

You can find it on page 4 (836) of their paper "Mutual Exclusion Using Indivisible Reads and Writes".

它有几个主要的优点:

  • 简单
  • 在它使用较少的内存(事实上,他们指出,他们的算法是最优的在这方面)
  • 所有共享内存(当然,但多个读者)只有一个作家
  • 这是很容易打开忙等待到延迟重试(见<一href="http://stackoverflow.com/questions/33270575/implementing-mutual-exclusion-algorithm-by-burns-and-lynch-in-java-could-there">here)
  • It is much simpler
  • It uses less memory (In fact, they state that their algorithm is optimal in that respect)
  • All shared memory has only a single writer (but multiple readers, of course)
  • It is easy to turn the busy-waits into delayed retries (see here)

旁白:这种算法的思路可以显示如下:

Aside: The idea of this algorithm can be visualized as follows:

我是一个过程,我在一排与其他人(所有其他进程)站在某处
当我想进入临界区,我做了以下内容:

  • 3/4:我期待我的左边,让我的手下来,只要我看到有人用自己的手
  • 5:如果没有人给我留下了自己的手,我把我的了
  • 6:我再次检查,如果有人给我的左边也同时把自己的手。如果是这样,我把我的背下来,并重新开始。否则,我把我的手。
  • 7:每个人在我右边先走,让我看看我的权利,等到我看不出有任何的手
  • 8:当所有的手在我右边是下来,我可以进入临界区
  • 1:当我做了,我把我的手回落
  • 3/4: I look to my left and keep my hand down as long as I see someone with their hand up.
  • 5: If no-one to my left has their hand up, I put mine up.
  • 6: I check again if anyone to my left has meanwhile put their hand up. If so, I put mine back down and start over. Otherwise, I keep my hand up.
  • 7: Everyone to my right goes first, so I look to my right and wait until I don't see any hands up.
  • 8: Once all hands to my right are down, I can enter the critical section.
  • 1: When I'm done, I put my hand back down.

我实现了这个念头从以上(含并购放大器; A)用这个算法,我目前测试的挫折感出它到目前为止似乎很稳定

I implemented the whole idea from above (including M&A) with this algorithm and am currently testing the heck out of it and so far it seems very stable.

的实施是非常简单的。有真的只有两个附加的告诫我不得不考虑(当然,除非,我错过了一些东西(指针欢迎)):

The implementation is very straight forward. There were really only two additional caveats I had to consider (unless, of course, I missed something (pointers welcome)):

  • 如果一个过程使得它的算法在第7行,它可能最终击中了 GOTO ,在这种情况下,在我的实现(见<一href="http://stackoverflow.com/questions/33270575/implementing-mutual-exclusion-algorithm-by-burns-and-lynch-in-java-could-there">here),在进入临界区将被拒绝,过程重试之后,当重试情况(可能有一个大的延迟),我需要刷新进程的标志,以这样的方式,它没有连最微小的时刻在此期间,该标志是过期的,或者,如果它这样做,我需要检测和跳回到第3行的算法,而不是继续在7行
  • 我添加了一个检查临界区是否需要在所有输入(即限制了临界区进入率的语句)。这是最有效的方式进行这样的检查权之前,进程甚至试图进入临界区。为了100%肯定没有任何线程都不能超过限速,不过,第二次检查将需要执行当一个进程顺利进入临界区。
  • If a process makes it to line 7 in the algorithm, it might end up hitting the goto, in which case, in my implementation (see here), the entry to the critical section is denied and the process tries again later. When the retry happens (possibly with a big delay), I need to refresh the process's flag in such a way that it didn't have even the tiniest moment during which the flag was expired, or, if it did, I need to detect that and jump back to line 3 in the algorithm instead of continuing at line 7.
  • I added a check whether the critical section needs to be entered at all (i.e. a statement that limits the rate at which the critical section is entered). It is most efficient to perform this check right before a process even tries to enter the critical section. To make 100% sure that no thread can ever exceed the rate limit, though, a second check will need to be performed when a process successfully entered the critical section.

由于共享的数据结构,我的code使用,它目前没有真正的状态,使得它有意义,张贴在这里,但随着工作的一点点,我可以把一个自包含的版本。如果有人有兴趣,请发表评论,我将这样做。

Due to the shared data-structures that my code uses, it's currently not really in a state that makes it meaningful to post here, but with a little bit of work, I could put together a self-contained version. If anyone is interested, please leave a comment and I will do so.

这篇关于只使用原子读取和数目不详的流程和强大的处理堕胎写互斥算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆