kthread_stop使内核崩溃 [英] kthread_stop crashes the kernel

查看:958
本文介绍了kthread_stop使内核崩溃的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图学习自旋锁和内核线程,并且编写了一个小模块来测试对内核代码的理解.代码段是:

I am trying to learn spinlocks and kernel threads, and I wrote a small module to test my understanding of the kernel code. The code snippet is :

static int kernel_test_thread(void *__unused) {
    int work;
    int x;
    allow_signal(SIGKILL);
    spin_lock(&kernel_test_device_lock);
    while(current->vfork_done == NULL || !kthread_should_stop())
    {
        if(signal_pending( current ))
            break;

        spin_unlock(&kernel_test_device_lock);
        msleep_interruptible(100);
        spin_lock(&kernel_test_device_lock);

        //do some work here
        for(work=0;work<=10000;++work)
        {
            x = work<<1;
        }
    }
    spin_unlock(&kernel_test_device_lock);
    return 0;
}

static int __init start_kernel_test(void) 
{
struct task_struct * ptask;
ptask = kthread_run(kernel_test_thread, NULL, "kernel_test_thread");
if(IS_ERR(ptask))
    return -1;

kernel_test_task = ptask;
return 0;
}

static void stop_kernel_test(void)
{

if(kernel_test_task)
    kthread_stop(kernel_test_task);
kernel_test_task=0;
}

static int __init init_test_kernel(void)
{
int rv;


spin_lock_init(&kernel_test_device_lock);
rv = start_kernel_test();
if(rv)
    return rv;

printk(KERN_INFO "kernel_test: Kernel Test module started\n");
return 0;
}

// Cleanup module
static void __exit cleanup_test_kernel(void)
{
spin_lock_bh(&kernel_test_device_lock);
stop_kernel_test();
spin_unlock_bh(&kernel_test_device_lock);
printk(KERN_INFO "kernel_test: Kernel Test module stopped\n");
}

module_init(init_test_kernel);
module_exit(cleanup_test_kernel);

当我尝试删除模块时,在"/var/log/syslog"中得到以下堆栈转储

When I try to remove the module, I get the following stack dump in "/var/log/syslog"

Jun 15 10:42:31 manik kernel: [  595.162463] BUG: scheduling while atomic:   rmmod/1719/0x00000200
Jun 15 10:42:31 manik kernel: [  595.162470] Modules linked in:  kernel_test(OE-) intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp  kvm_intel kvm irqbypass crc32_pclmul snd_usb_audio lpc_ich snd_usbmidi_lib input_leds joydev hid_multitouch snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep ie31200_edac shpchp edac_core 8250_fintek snd_soc_rt5640 snd_soc_rl6231 snd_soc_core snd_compress ac97_bus snd_pcm_dmaengine snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device snd_timer dw_dmac snd dw_dmac_core elan_i2c snd_soc_sst_acpi spi_pxa2xx_platform soundcore 8250_dw i2c_designware_platform i2c_designware_core soc_button_array mac_hid parport_pc ppdev lp parport autofs4 nouveau i915 mxm_wmi wmi ttm i2c_algo_bit drm_kms_helper e1000e syscopyarea ptp ahci sysfillrect libahci sysimgblt fb_sys_fops pps_core drm sdhci_acpi video sdhci i2c_hid fjes hid_generic usbhid hid
Jun 15 10:42:31 manik kernel: [  595.162568] CPU: 3 PID: 1719 Comm: rmmod Tainted: G          IOE   4.4.0-22-generic #40
Jun 15 10:42:31 manik kernel: [  595.162572] Hardware name: ADLINK Technology Inc. Express-HL./SHARKBAY, BIOS 1.14 01/01/2013
Jun 15 10:42:31 manik kernel: [  595.162575]  c1ac1967 b70efc6e 00000286 eeebde14 c139dccf e7d44dc0 c1c64dc0 eeebde2c
Jun 15 10:42:31 manik kernel: [  595.162584]  c1090627 c19b6c68 e5343ff0 000006b7 00000200 eeebde68 c17a4518 ffffffff
Jun 15 10:42:31 manik kernel: [  595.162593]  e7d0be00 b70efc6e e7d0be00 00000003 e7d0be00 00000000 c10a4760 e7d44dc0
Jun 15 10:42:31 manik kernel: [  595.162601] Call Trace:
Jun 15 10:42:31 manik kernel: [  595.162615]  [<c139dccf>] dump_stack+0x58/0x79
Jun 15 10:42:31 manik kernel: [  595.162623]  [<c1090627>] __schedule_bug+0x57/0x70
Jun 15 10:42:31 manik kernel: [  595.162630]  [<c17a4518>] __schedule+0x5e8/0x770
Jun 15 10:42:31 manik kernel: [  595.162637]  [<c10a4760>] ? enqueue_task_fair+0x90/0xd40
Jun 15 10:42:31 manik kernel: [  595.162642]  [<c17a46cd>] schedule+0x2d/0x80
Jun 15 10:42:31 manik kernel: [  595.162648]  [<c17a7085>] schedule_timeout+0x185/0x210
Jun 15 10:42:31 manik kernel: [  595.162655]  [<c124c8b1>] ? sysfs_kf_seq_show+0xb1/0x150
Jun 15 10:42:31 manik kernel: [  595.162661]  [<c1095d0d>] ? check_preempt_curr+0x4d/0x90
Jun 15 10:42:31 manik kernel: [  595.162666]  [<c1095d67>] ? ttwu_do_wakeup+0x17/0x110
Jun 15 10:42:31 manik kernel: [  595.162672]  [<c17a4fd2>] wait_for_completion+0x92/0xf0
Jun 15 10:42:31 manik kernel: [  595.162678]  [<c1096c00>] ? wake_up_q+0x70/0x70
Jun 15 10:42:31 manik kernel: [  595.162684]  [<c108b771>] kthread_stop+0x41/0xf0
Jun 15 10:42:31 manik kernel: [  595.162691]  [<f06c109b>] cleanup_test_kernel+0x1b/0xf80 [kernel_test]
Jun 15 10:42:31 manik kernel: [  595.162698]  [<c10f4a0c>] SyS_delete_module+0x1ac/0x200
Jun 15 10:42:31 manik kernel: [  595.162704]  [<c11dc7bd>] ? ____fput+0xd/0x10
Jun 15 10:42:31 manik kernel: [  595.162709]  [<c1089a64>] ? task_work_run+0x84/0xa0
Jun 15 10:42:31 manik kernel: [  595.162715]  [<c10030f6>] ? exit_to_usermode_loop+0xb6/0xe0
Jun 15 10:42:31 manik kernel: [  595.162721]  [<c100393d>] do_fast_syscall_32+0x8d/0x150
Jun 15 10:42:31 manik kernel: [  595.162728]  [<c17a8098>] sysenter_past_esp+0x3d/0x61

能否请您帮助我了解这里到底发生了什么?

Could you please help me understand what exactly is going on here ?

谢谢

推荐答案

spin_lock_bh()开始 atomic 部分,在该部分禁止调用任何可能等待的函数.但是kthread_stop() 等待退出线程.

spin_lock_bh() begins atomic section, where it is forbidden to call any function which may wait. But kthread_stop() waits for thread being exited.

因为从 kthread 退出会破坏线程结构,除非有人之前增加了线程的使用计数器.调用kthread_stop()时,它:

Because exiting from kthread destroyes thread structure unless someone increments thread's usage counter before. When kthread_stop() is called, it:

  1. 增加kthread的使用情况计数器.
  2. kthread 设置停止"标志.
  3. 等待kthread完成.
  4. 减少kthread的使用情况计数器.
  1. Increments usage counter for kthread.
  2. Sets "stop" flag for the kthread.
  3. Waits kthread for finish.
  4. Decrements usage counter for kthread.

用于kthread_stop保证的这种算法,

在发现kthread_should_stop()非零之后,从kthread 退出总是安全的.

Exiting from kthread after finding kthread_should_stop() being non-zero is always safe.

但是,如果kthread可能会退出而没有检查kthread_should_stop(),则应采取其他行动保证,这kthread_stop()不会看到kthread被销毁.可能的方法:

But if kthread may exit without checking kthread_should_stop(), additional actions should be taken for garantee, that kthread_stop() doesn't see kthread being destroyed. Possible way for doing this:

struct task_struct* kernel_test_task;

int module_init(void)
{
    // Create kthread, but don't start it.
    kernel_test_task = kthread_create(...);
    // Increments usage counter.
    get_task_struct(kernel_test_task);
    // Now it is safe to start kthread - exiting from it doesn't destroy its struct.
    wake_up_process(kernel_test_task);
}

void module_cleanup(void)
{
    // While thread may be finished now, its structure is garanteed to be alive.
    kthread_stop(kernel_test_task);
    // This will decrement usage counter, incremented in module_init.
    put_task_struct(kernel_test_task);
    // Now thread is garanteed to be finished, and its struct destroyed.
}

这篇关于kthread_stop使内核崩溃的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆