Ubuntu的:sem_timedwait没有醒来(C) [英] ubuntu: sem_timedwait not waking (C)

查看:336
本文介绍了Ubuntu的:sem_timedwait没有醒来(C)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有3个过程需要被同步。处理一个做一些事情,然后唤醒过程中的两个和睡觉,而做一些事情,然后唤醒过程中的三个和睡觉,而做一些事情并唤醒过程中的一个和睡觉。整个循环被定时到周围的25Hz运行(由外部同步成过程中的一个在我的真正的应用程序触发过程中的两个前)。我用sem_post触发(唤醒)的每个过程,和sem_timedwait()等待触发。

I have 3 processes which need to be synchronized. Process one does something then wakes process two and sleeps, which does something then wakes process three and sleeps, which does something and wakes process one and sleeps. The whole loop is timed to run around 25hz (caused by an external sync into process one before it triggers process two in my "real" application). I use sem_post to trigger (wake) each process, and sem_timedwait() to wait for the trigger.

这一切工作成功的几个小时。然而,在一些随机时间(通常后介于二和四小时),过程中的一个sem_timedwait开始超时(),即使我相信信号灯被触发的sem_post()。为了证明这一点,我甚至使用sem_getvalue()的超时之后,并且该值是1,所以timedwait应该被触发。

This all works successfully for several hours. However at some random time (usually after somewhere between two and four hours), one of the processes starts timing out in sem_timedwait(), even though I am sure the semaphore is being triggered with sem_post(). To prove this I even use sem_getvalue() immediately after the timeout, and the value is 1, so the timedwait should have been triggered.

请参阅以下code:

#include <stdio.h>
#include <time.h>
#include <string.h>
#include <errno.h>
#include <semaphore.h>

sem_t trigger_sem1, trigger_sem2, trigger_sem3;

// The main thread process.  Called three times with a different num arg - 1, 2 or 3.
void *thread(void *arg)
{
  int num = (int) arg;
  sem_t *wait, *trigger;
  int val, retval;
  struct timespec ts;
  struct timeval tv;

  switch (num)
    {
      case 1:
        wait = &trigger_sem1;
        trigger = &trigger_sem2;
        break;
      case 2:
        wait = &trigger_sem2;
        trigger = &trigger_sem3;
        break;
      case 3:
        wait = &trigger_sem3;
        trigger = &trigger_sem1;
        break;
    }

  while (1)
    {
      // The first thread delays by 40ms to time the whole loop.  
      // This is an external sync in the real app.
      if (num == 1)   
        usleep(40000);

      // print sem value before we wait.  If this is 1, sem_timedwait() will
      // return immediately, otherwise it will block until sem_post() is called on this sem. 
      sem_getvalue(wait, &val);
      printf("sem%d wait sync sem%d. val before %d\n", num, num, val);

          // get current time and add half a second for timeout.
      gettimeofday(&tv, NULL);
      ts.tv_sec = tv.tv_sec;
      ts.tv_nsec = (tv.tv_usec + 500000);    // add half a second
      if (ts.tv_nsec > 1000000)
        {
          ts.tv_sec++;
          ts.tv_nsec -= 1000000;
        }
      ts.tv_nsec *= 1000;    /* convert to nanosecs */

      retval = sem_timedwait(wait, &ts);
      if (retval == -1)
        {
          // timed out.  Print value of sem now.  This should be 0, otherwise sem_timedwait
          // would have woken before timeout (unless the sem_post happened between the 
          // timeout and this call to sem_getvalue).
          sem_getvalue(wait, &val);
          printf("!!!!!!    sem%d sem_timedwait failed: %s, val now %d\n", 
            num, strerror(errno), val);
        }
      else
        printf("sem%d wakeup.\n", num);

        // get value of semaphore to trigger.  If it's 1, don't post as it has already been 
        // triggered and sem_timedwait on this sem *should* not block.
      sem_getvalue(trigger, &val);
      if (val <= 0)
        {
          printf("sem%d send sync sem%d. val before %d\n", num, (num == 3 ? 1 : num+1), val);
          sem_post(trigger);
        }
      else
        printf("!! sem%d not sending sync, val %d\n", num, val);
    }
}



int main(int argc, char *argv[])
{
  pthread_t t1, t2, t3;

   // create semaphores.  val of sem1 is 1 to trigger straight away and start the whole ball rolling.
  if (sem_init(&trigger_sem1, 0, 1) == -1)
    perror("Error creating trigger_listman semaphore");
  if (sem_init(&trigger_sem2, 0, 0) == -1)
    perror("Error creating trigger_comms semaphore");
  if (sem_init(&trigger_sem3, 0, 0) == -1)
    perror("Error creating trigger_vws semaphore");

  pthread_create(&t1, NULL, thread, (void *) 1);
  pthread_create(&t2, NULL, thread, (void *) 2);
  pthread_create(&t3, NULL, thread, (void *) 3);

  pthread_join(t1, NULL);
  pthread_join(t2, NULL);
  pthread_join(t3, NULL);
}

当程序(后在开始和一个随机但是长的时间)正确运行以下输出被打印。 SEM1的值总是1日前线程1等待,因为它休眠40毫秒,而此时sem3触发它,所以它唤醒立竿见影。另外两个线程等到信号从previous线程接收。

The following output is printed when the program is running correctly (at the start and for a random but long time after). The value of sem1 is always 1 before thread1 waits as it sleeps for 40ms, by which time sem3 has triggered it, so it wakes straight away. The other two threads wait until the semaphore is received from the previous thread.

[...]
sem1 wait sync sem1. val before 1
sem1 wakeup.
sem1 send sync sem2. val before 0
sem2 wakeup.
sem2 send sync sem3. val before 0
sem2 wait sync sem2. val before 0
sem3 wakeup.
sem3 send sync sem1. val before 0
sem3 wait sync sem3. val before 0
sem1 wait sync sem1. val before 1
sem1 wakeup.
sem1 send sync sem2. val before 0
[...]

然而,在几个小时后,其中一个线程开始超时。我可以从该信号被触发输出看到的,当我在超时后打印的价值,它是1超时之前以及因此sem_timedwait应该已经醒了。我从来没有期望信号量的值为1超时后,保存为非常难得的机会(几乎可以肯定从来没有,但它是可能的),当超时后,但我打电话sem_getvalue之前触发发生。

However, after a few hours, one of the threads begins to timeout. I can see from the output that the semaphore is being triggered, and when I print the value after the timeout, it is 1. So sem_timedwait should have woken up well before the timeout. I would never expect the value of the semaphore to be 1 after the timeout, save for the very rare occasion (almost certainly never but it's possible) when the trigger happens after the timeout but before I call sem_getvalue.

另外,一旦它开始衰退,每sem_timedwait()在该信号也将失败以相同的方式。看到下面的输出,我已经行号:

Also, once it begins to fail, every sem_timedwait() on that semaphore also fails in the same way. See the following output, which I've line-numbered:

01  sem3 wait sync sem3. val before 0
02  sem1 wakeup.
03  sem1 send sync sem2. val before 0
04  sem2 wakeup.
05  sem2 send sync sem3. val before 0
06  sem2 wait sync sem2. val before 0
07  sem1 wait sync sem1. val before 0
08  !!!!!!    sem3 sem_timedwait failed: Connection timed out, val now 1
09  sem3 send sync sem1. val before 0
10  sem3 wait sync sem3. val before 1
11  sem3 wakeup.
12  !! sem3 not sending sync, val 1
13  sem3 wait sync sem3. val before 0
14  sem1 wakeup.
[...]

在1号线,3线(我已经叫容易引起混淆在sem3与printf)等待sem3被触发。在第5行,线程为sem3调用sem_post。然而,第8行示出sem3定时,但是信号量的值是1 thread3然后触发SEM1和再次等待(10)。然而,因为该值已经1,它唤醒马上。它不会再次发送SEM1因为这一切发生之前,控制权将交给线程1,但它然后再等待(VAL现在为0),并SEM1醒来。现在这重复,直到永远,永远sem3超时,并显示出值为1。

On line 1, thread 3 (which I have confusingly called sem3 in the printf) waits for sem3 to be triggered. On line 5, thread2 calls sem_post for sem3. However, line 8 shows sem3 timing out, but the value of the semaphore is 1. thread3 then triggers sem1 and waits again (10). However, because the value is already 1, it wakes straight away. It doesn't send sem1 again as this has all happened before control is given to thread1, however it then waits again (val is now 0) and sem1 wakes up. This now repeats for ever, sem3 always timing out and showing that the value is 1.

所以,我的问题是,为什么sem3超时,即使信号已被触发,值显然是1?我绝不会希望看到在输出线08。如果超时(因为,说线程2已经崩溃或时间过长),该值应为0。为什么它的3或4小时首次进入这个状态之前正常工作?

So, my question is why does sem3 timeout, even though the semaphore has been triggered and the value is clearly 1? I would never expect to see line 08 in the output. If it times out (because, say thread 2 has crashed or is taking too long), the value should be 0. And why does it work fine for 3 or 4 hours first before getting into this state?

我曾尝试使用三个单独的程序类似的试验,在同样的程序在共享存储器进行通信,而不是三个线程。这更像我的现实世界的应用程序。结果和输出是相同的。这个问题不是出现在信号量(特别是sem_timedwait调用),而不是什么与pthread的。

I have tried a similar test using three separate programs, communicating over shared memory, rather than three threads in the same program. This more closely resembles my real world application. The results and the output were the same. The problem does appear to be in the semaphore (particularly the sem_timedwait call) rather than anything to do with pthreads.

我还试图较短和较长的延迟,以及完全去除延迟,具有相似的结果,以上述那些。有了根本没有延迟它有时启动分钟,而不是小时后,产生的误差。这并不当然意味着问题可以被复制的快很多。

I have also tried shorter and longer delays, as well as removing the delay completely, with similar results to those described above. With no delay at all it can sometimes start to produce the error after minutes rather than hours. This does of course mean that the problem can be reproduced a lot quicker.

这是使用Ubuntu 9.4内核2.6.28。同样的过程已经在红帽和Fedora工作正常,但现在我想移植到Ubuntu的。我已经使用Ubuntu 9.10,这并没有区别也试过。

This is using Ubuntu 9.4 with kernel 2.6.28. The same procedure has been working properly on Redhat and Fedora, but I'm now trying to port to Ubuntu. I have also tried using Ubuntu 9.10, which made no difference.

感谢您的任何建议,
贾尔斯

Thanks for any advice, Giles

推荐答案

这个问题似乎来自传递一个无效的超时参数。

The problem seems to come from passing an invalid timeout argument.

至少在我的机器上,第一次失败并不ETIMEDOUT,但是:

At least on my machine, the first failure is not ETIMEDOUT but:

!!!!!! sem2 sem_timedwait失败:无效的参数,现在VAL 0

!!!!!! sem2 sem_timedwait failed: Invalid argument, val now 0

现在,如果我写的:

  if (ts.tv_nsec >= 1000000)

(注意增加=的),那么它工作正常。这是effed了另一个问题,为什么信号的状态,获得(presumably),使其超时随后尝试或干脆就直sem_wait块永远。貌似在libc中或内核中的错误。

(note the addition of =) then it works fine. It's another question why the state of semaphore gets (presumably) effed up so that it times out on subsequent attempts or simply blocks forever on straight sem_wait. Looks like a bug in libc or the kernel.

这篇关于Ubuntu的:sem_timedwait没有醒来(C)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆