无锁队列的实现最终有在压力下循环 [英] Lockless queue implementation ends up having a loop under stress

查看:235
本文介绍了无锁队列的实现最终有在压力下循环的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在包含从多个线程发布到与在单个线程中处理的请求的链接列表的形式写在C无锁队列。压力的几个小时后,我最终的最后一个请求的下一个指针指向自己,创造无限循环并锁定了处理线程。

I have lockless queues written in C in form of a linked list that contains requests from several threads posted to and handled in a single thread. After a few hours of stress I end up having the last request's next pointer pointing to itself, which creates an endless loop and locks up the handling thread.

运行应用程序(和失败),在Linux和Windows。我调试在Windows上,在我的 COMPARE_EXCHANGE_PTR 映射到的 InterlockedCompareExchangePointer

The application runs (and fails) on both Linux and Windows. I'm debugging on Windows, where my COMPARE_EXCHANGE_PTR maps to InterlockedCompareExchangePointer.

这是code,它推到列表中的头一个请求,并从多个线程名为:

This is the code that pushes a request to the head of the list, and is called from several threads:

void push_request(struct request * volatile * root, struct request * request)
{
    assert(request);

    do {
        request->next = *root;
    } while(COMPARE_EXCHANGE_PTR(root, request, request->next) != request->next);
}

这是code,获取从列表的末尾的请求,并且仅由一个线程正在处理它们称为:

This is the code that gets a request from the end of the list, and is only called by a single thread that is handling them:

struct request * pop_request(struct request * volatile * root)
{
    struct request * volatile * p;
    struct request * request;

    do {
        p = root;
        while(*p && (*p)->next) p = &(*p)->next; // <- loops here
        request = *p;
    } while(COMPARE_EXCHANGE_PTR(p, NULL, request) != request);

     assert(request->next == NULL);

     return request;
}

请注意,我不是用一个尾指针因为我想以避免应对 push_request 尾指针的复杂性。不过,我怀疑问题可能是我找了列表的末尾的方式。

Note that I'm not using a tail pointer because I wanted to avoid the complication of having to deal with the tail pointer in push_request. However I suspect that the problem might be in the way I find the end of the list.

有几个地方是推一个请求到队列中,但是他们看起来都generaly是这样的:

There are several places that push a request into the queue, but they all look generaly like this:

// device->requests is defined as struct request * volatile requests;
struct request * request = malloc(sizeof(struct request));
if(request) {
    // fill out request fields
    push_request(&device->requests, request);
    sem_post(device->request_sem);
}

在code处理该请求是做多的是,但在本质上做到这一点在一个循环:

The code that handles the request is doing more than that, but in essence does this in a loop:

if(sem_wait_timeout(device->request_sem, timeout) == sem_success) {
    struct request * request = pop_request(&device->requests);
    // handle request
    free(request);
}

我也只是补充说,之前和每次操作后检查列表复制功能,但恐怕这个检查将改变定时,这样我将永远不会遇到它失败的地步。 (我在等待它打破因为我写的。)

I also just added a function that is checking the list for duplicates before and after each operation, but I'm afraid that this check will change the timing so that I will never encounter the point where it fails. (I'm waiting for it to break as I'm writing this.)

当我打破挂程序 pop_request 处理器线程循环在标记的位置。我有一个或更多的请求,最后一个的下一个指针指向自身的一个有效列表。请求队列通常都很短,我从来没有见过超过10,只有1和3的两倍,我可以看看这种故障在调试器。

When I break the hanging program the handler thread loops in pop_request at the marked position. I have a valid list of one or more requests and the last one's next pointer points to itself. The request queues are usually short, I've never seen more then 10, and only 1 and 3 the two times I could take a look at this failure in the debugger.

我认为这是通过尽我所能,我得出的结论,我应该永远不能在我的名单一个循环结束了,除非我把相同的请求两次。我敢肯定,这永远不会发生。我也相当肯定(虽然不是完全),它是不是 ABA问题

I thought this through as much as I could and I came to the conclusion that I should never be able to end up with a loop in my list unless I push the same request twice. I'm quite sure that this never happens. I'm also fairly sure (although not completely) that it's not the ABA problem.

我知道我可能会弹出在同一时间多个请求,但我相信这是无关紧要的这里,而且我从来没有见过它的发生。 (我会解决这个问题,以及)

I know that I might pop more than one request at the same time, but I believe this is irrelevant here, and I've never seen it happening. (I'll fix this as well)

我觉得漫长而艰难的我怎么能打破我的功能,但我看不到的方式与一个循环结束了。

I thought long and hard about how I can break my function, but I don't see a way to end up with a loop.

所以,问题是:谁能明白的方式如何能破?有人可以证明这不能?

So the question is: Can someone see a way how this can break? Can someone prove that this can not?

最后,我会解决这个问题(可能通过使用尾指针或一些其他的解决方案 - 锁定将是一个问题,因为这张贴线程应该不会被锁定,我手头虽然RW锁),但我想确保改变列表实际上解决我的问题(而不是使它只是不太可能因为不同的定时的)。

Eventually I will solve this (maybe by using a tail pointer or some other solution - locking would be a problem because the threads that post should not be locked, I do have a RW lock at hand though) but I would like to make sure that changing the list actually solves my problem (as opposed to makes it just less likely because of different timing).

推荐答案

这是微妙的,但你有一个竞争条件存在。

It's subtle but you do have a race condition there.

开始与列表中已有一个元素, REQ1 。因此,我们有:

Start with a list with one element in it, req1. So we have:

device->requests == req1;
req1->next == NULL;

现在,我们推了一个新元素 REQ2 ,同时尽量弹出队列。为 REQ2 推先启动。 while循环体中运行,所以我们现在有:

Now, we push a new element req2, and simultaneously try to pop the queue. The push for req2 starts first. The while loop body runs, so we now have:

device->requests == req1;
req1->next == NULL;
req2->next == req1;

然后 COMPARE_EXCHANGE_PTR 运行,因此我们有:

device->requests == req2;
req1->next == NULL;
req2->next == req1;

...和 COMPARE_EXCHANGE_PTR()收益 REQ1 。现在,在这一点上,<​​STRONG>在,而条件比较之前,推被中断,并在弹出开始运行。

...and the COMPARE_EXCHANGE_PTR() returns req1. Now, at this point, before the comparison in the while condition, the push gets interrupted and the pop starts running.

弹出正常运行到结束,突然离开 REQ1 - 这意味着我们有:

The pop runs correctly to completion, popping off req1 - which means that we have:

device->requests == req2;
req2->next == NULL;

推重启。现在获取请求 - &gt;接下来做比较 - 它取了的新的 req2-&GT的价值;接下来,这是 NULL 。它比较 REQ1 NULL ,比较成功,while循环再次运行,现在我们有:

The push restarts. It now fetches request->next to do the comparison - and it fetches the new value of req2->next, which is NULL. It compares req1 with NULL, the comparison succeeds, the while loop runs again, and now we have:

device->requests == req2;
req2->next == req2;

这一次测试失败,while循环退出时,你有你的循环。

This time the test fails, the while loop exits, and you have your loop.

这应该修复它:

void push_request(struct request * volatile * root, struct request * request)
{
    struct request *oldroot;

    assert(request);

    do {
        request->next = oldroot = *root;
    } while(COMPARE_EXCHANGE_PTR(root, request, oldroot) != oldroot);
}

这篇关于无锁队列的实现最终有在压力下循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆