死锁与线程通过sysfs调用内核信号量 [英] Deadlock with threads calling down on kernel semaphore through sysfs

查看:116
本文介绍了死锁与线程通过sysfs调用内核信号量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

源自此问题(和

Originating from this question (and my solution), I have come to realize there is a possible deadlock, but I can't understand why and how I can avoid it.

简而言之,内核模块(它们实际上是在内核空间中运行的应用程序)可以在内核空间中使用semaphore,但是用户空间应用程序也需要采用相同的信号量来保护全局共享内存.

In short, there is a semaphore in kernel space that kernel modules (they are really applications running in kernel space) could take, but user space applications would also need to take the same semaphore for protecting a globally shared memory.

我是通过公开一个sysfs文件来完成此操作的,该文件给出了正确的字符,该文件将downup在内核空间中使用.用户空间应用程序将只打开此文件,并write适当的字符以进行锁定.

I have done this by exposing a sysfs file which given the correct character, would down or up the semaphore in kernel space. The user space applications would just keep this file open and write the appropriate character for the lock to take place.

这是一个用于演示的示例内核模块:

Here's a sample kernel module for demonstration:

#include <linux/module.h>
#include <linux/semaphore.h>
#include <linux/sysfs.h>
#include <linux/kobject.h>

MODULE_LICENSE("GPL");
MODULE_AUTHOR("Shahbaz Youssefi");
MODULE_DESCRIPTION("Test module");

static struct kobject *_kobj = NULL;
static struct semaphore sem;

static ssize_t _lock_op(struct kobject *kobj, struct kobj_attribute *attr, const char *buf, size_t count)
{
    switch (buf[0])
    {
    case '0':
        printk("down (%u)\n", sem.count);
        if (down_interruptible(&sem))
            printk("error: sem wait interrupted\n");
        break;
    case '1':
        printk("up (%u)\n", sem.count);
        up(&sem);
        break;
    default:
        printk("error: invalid request %d\n", buf[0]);
    }
    return count;
}

static struct kobj_attribute _lock_attr = __ATTR(test, 0222, NULL, _lock_op);

static int __init _main_init(void)
{
    sema_init(&sem, 1);

    _kobj = kobject_create_and_add("test", NULL);
    if (!_kobj)
    {
        printk("error: failed to create /sys directory for test\n");
        return -ENOMEM;
    }
    if (sysfs_create_file(_kobj, &_lock_attr.attr))
        printk("error: could not create /sys file\n");

    printk("loaded\n");
    return 0;
}

static void __exit _main_exit(void)
{
    if (_kobj)
        kobject_put(_kobj);
    _kobj = NULL;

    printk("unloaded\n");
}

module_init(_main_init);
module_exit(_main_exit);

总体来说效果很好.用户空间应用程序可以将'0''1'写入sysfs文件,并且可以实现互斥,而不会出现问题.

This works great in general. The user-space applications can write '0' or '1' to the sysfs file and they achieve mutual exclusion without a problem.

但是,在一种情况下,它会锁定进程,即同一进程的多个线程试图获取该锁.

However, there is one case where this locks up the process, and that is when multiple threads of the same process try to acquire the lock.

本质上是这样的:

       Thread 1               Thread 2

       write '0'
      system call
       _lock_op
   down_interruptible
   return from syscall
                              write '0'
                             system call
                              _lock_op
                           down_interruptible (blocked)

*go on to release the lock*

                            *return from syscall*
                         *go on to release the lock*

问题在于,在这种情况下,第二个down发生了,而第一个down仍未释放锁,而不仅仅是第二个线程被阻塞了,整个过程都被阻塞了.也就是说,标有*的步骤不会发生.

The problem is that in such a case, where the second down happens while the first one still hasn't released the lock, instead of just the second thread getting blocked, the whole process gets blocked. That is, the steps marked with * don't happen.

这是一个用户空间应用程序,可以在插入上述内核模块时触发此操作:

Here's a user-space application that can trigger this when the above kernel module is inserted:

#include <stdio.h>
#include <pthread.h>
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

static int fid;
static volatile sig_atomic_t interrupted = 0;

static void sig_handler(int signum)
{
    interrupted = 1;
}

static void *func(void *arg)
{
    while (!interrupted)
    {
        write(fid, "0", 1);
        write(fid, "1", 1);
        usleep(1000);
    }

    return NULL;
}

int main(void)
{
    pthread_t tid;

    struct sigaction sa = {
        .sa_handler = sig_handler,
    };
    sigemptyset(&sa.sa_mask);
    sigaction(SIGSEGV, &sa, NULL);
    sigaction(SIGINT, &sa, NULL);
    sigaction(SIGHUP, &sa, NULL);
    sigaction(SIGTERM, &sa, NULL);
    sigaction(SIGQUIT, &sa, NULL);
    sigaction(SIGUSR1, &sa, NULL);
    sigaction(SIGUSR2, &sa, NULL);

    fid = open("/sys/test/test", O_WRONLY);
    if (fid < 0)
        return EXIT_FAILURE;

    pthread_create(&tid, NULL, func, NULL);

    while (!interrupted)
    {
        write(fid, "0", 1);
        write(fid, "1", 1);
        usleep(793);
    }

    pthread_join(tid, NULL);

    close(fid);

    return 0;
}

注意:请执行echo 1 > /sys/test/test解除对自己的屏蔽;)

Note: do echo 1 > /sys/test/test to unblock yourself ;)

我的问题是,为什么Linux在down上而不是在调用线程上阻止整个进程?我该怎么办?

My question is, why does Linux block the whole process on down rather than just the calling thread? And what can I do about it?

注意:在x86上进行了RTAI修补的内核3.8上的测试.为了确定起见,我稍后将尝试使用较新的香草内核,但我怀疑它与RTAI无关.

Note: tested on x86, kernel 3.8 patched with RTAI. I will try with a newer an vanilla kernel later just to be sure, but I suspect it is unrelated to RTAI.

推荐答案

我实际上已经找到解决此问题的方法,但我仍然认为应该有适当的解释和解决方案.

I have actually figured out a way to go around this problem, but I still think there should be a proper explanation and a solution.

我的解决方法如下:

在应用程序中使用pthread互斥锁.在每个线程上,而不是:

Take a pthread mutex in the application. On each thread, instead of:

write(fid, "0", 1);
/* access */
write(fid, "1", 1);

pthread_mutex_lock(&mutex);
write(fid, "0", 1);
/* access */
write(fid, "1", 1);
pthread_mutex_unlock(&mutex);

这使进程对sysfs文件的所有访问互斥. sysfs文件可确保访问在进程和内核模块之间是互斥的.

This makes all accesses by the process to the sysfs file mutually exclusive. The sysfs file makes sure the accesses are mutually exclusive among processes and kernel modules.

这篇关于死锁与线程通过sysfs调用内核信号量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆