“无法在虚拟地址处处理内核NULL指针取消引用." -在发信号通知内核模块时|面向对象 [英] "Unable to handle kernel NULL pointer dereference at Virtual Address." - On signalling the Kernel Module | OOPS
问题描述
我正在学习内核模块和线程的一些基础知识.因此,我尝试制作一个示例模块并对其进行测试. 现在,它已成功加载.
I was learning some basics of kernel modules and threads. And so i tried to make a example module and test it. Now, it loads successfully.
模块代码:
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/kthread.h>
#include <linux/delay.h>
#include <linux/version.h>
static struct task_struct *thread_st;
// Function called by thread
static int thread_fun(void *unused)
{
allow_signal(SIGKILL);
while(!kthread_should_stop())
{
printk(KERN_INFO "Thread Running\n");
ssleep(5);
if(signal_pending(current))
break;
}
printk(KERN_INFO "Thread Stopping\n");
do_exit(0);
return 0;
}
// Module initialisation
static int __init init_thread(void)
{
printk(KERN_INFO "Creating Thread\n");
thread_st = kthread_run(thread_fun, NULL, "mythread");
if(thread_st)
printk(KERN_INFO "Thread created successfully\n");
else
printk(KERN_INFO "Thread creation failed\n");
return 0;
}
// Module exit
static void __exit cleanup_thread(void)
{
printk(KERN_INFO "Cleaning up\n");
if(thread_st)
{
kthread_stop(current);
printk(KERN_INFO "Thread Stopped\n");
}
}
module_init(init_thread);
module_exit(cleanup_thread);
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Pinkesh Badjatiya");
MODULE_DESCRIPTION("Simple Kernel Module");
现在,一旦模块被加载,我要执行的卸载步骤就是
Now, once the module is loaded the procedure i follow to unload it is,
- 发送SIGKILL信号, sudo kill -9 [PID]
- 等待dmesg显示
'Thread Stopping'
,这仅表示kthread_should_stop()
已返回true. - 删除模块 sudo rmmod [MODULE_NAME]
- Send a SIGKILL signal, sudo kill -9 [PID]
- Wait for the dmesg to show
'Thread Stopping'
, which simply means that thekthread_should_stop()
has returned true. - Remove the module, sudo rmmod [MODULE_NAME]
dmesg 日志:
[ 492.979030] Creating Thread
[ 492.979753] Thread created successfully
[ 492.979776] Thread Running
[ 497.985420] Thread Running
[ 502.992223] Thread Running
[ 507.999007] Thread Running
[ 513.005837] Thread Running
[ 518.012585] Thread Running
[ 523.019354] Thread Running
[ 528.026195] Thread Running
[ 533.032919] Thread Running
[ 538.039795] Thread Running
[ 543.046588] Thread Running
[ 548.053383] Thread Stopping
[ 556.317200] Cleaning up
[ 556.317212] Thread Stopped
现在,当我使用原始使用的结构指针 thread_st 更改变量 current ,然后加载模块并按照与上述相同的步骤删除模块时,即内核产生恐慌(OOPS)并填充dmesg日志.
Now when i change the variable current with the original used struct pointer thread_st and then load the module and follow the same procedure as above to remove the module, the kernel gives a panic(OOPS) and fills up the dmesg log.
我也在Ubuntu上看到一个Report Error
弹出窗口.
I also get a Report Error
popup on Ubuntu.
dmesg 日志:
[ 1269.832922] Creating Thread
[ 1269.833888] Thread created successfully
[ 1269.834217] Thread Running
[ 1274.839425] Thread Running
[ 1279.846211] Thread Running
[ 1284.853017] Thread Running
[ 1289.859819] Thread Running
[ 1294.866589] Thread Running
[ 1299.873353] Thread Stopping
[ 1305.758783] Cleaning up
[ 1305.758853] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 1305.762603] IP: [<ffffffff81096d6b>] exit_creds+0x1b/0x70
[ 1305.766266] PGD 0
[ 1305.769967] Oops: 0000 [#3] SMP
[ 1305.774675] Modules linked in: kernel_thread_example(OE-) vmnet(OE) vmw_vsock_vmci_transport vsock vmw_vmci vmmon(OE) cmac rmd160 crypto_null camellia_generic camellia_x86_64 cast6_avx_x86_64 cast6_generic cast5_avx_x86_64 cast5_generic cast_common deflate cts ctr gcm ccm serpent_avx_x86_64 serpent_sse2_x86_64 serpent_generic blowfish_generic blowfish_x86_64 blowfish_common twofish_generic twofish_avx_x86_64 twofish_x86_64_3way xts twofish_x86_64 twofish_common xcbc sha256_ssse3 sha512_ssse3 des_generic aes_x86_64 lrw gf128mul glue_helper ablk_helper xfrm_user ah6 ah4 esp6 esp4 xfrm4_mode_beet xfrm4_tunnel tunnel4 xfrm4_mode_tunnel xfrm4_mode_transport xfrm6_mode_transport xfrm6_mode_ro xfrm6_mode_beet xfrm6_mode_tunnel ipcomp ipcomp6 xfrm6_tunnel tunnel6 xfrm_ipcomp af_key xfrm_algo bnep rfcomm bluetooth 6lowpan_iphc uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core v4l2_common videodev media snd_hda_codec_hdmi snd_hda_codec_conexant snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi arc4 snd_seq intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm ath9k ath9k_common ath9k_hw crct10dif_pclmul snd_seq_device crc32_pclmul snd_timer ath ghash_clmulni_intel cryptd mac80211 joydev serio_raw snd cfg80211 i915 lpc_ich shpchp soundcore drm_kms_helper drm mei_me mei i2c_algo_bit mac_hid video wmi parport_pc ppdev lp parport hid_generic usbhid hid psmouse ahci libahci atl1c [last unloaded: kernel_thread_example]
[ 1305.817666] CPU: 3 PID: 4038 Comm: rmmod Tainted: G D OE 3.16.0-50-generic #66~14.04.1-Ubuntu
[ 1305.822078] Hardware name: HCL Infosystems Limited HCL ME LAPTOP/HCL Infosystems Limited, BIOS 203.T01 03/19/2011
[ 1305.826447] task: ffff8800a6221e90 ti: ffff880119700000 task.ti: ffff880119700000
[ 1305.830740] RIP: 0010:[<ffffffff81096d6b>] [<ffffffff81096d6b>] exit_creds+0x1b/0x70
[ 1305.834968] RSP: 0018:ffff880119703e90 EFLAGS: 00010246
[ 1305.839081] RAX: 0000000000000000 RBX: ffff8800b6e065e0 RCX: 0000000000000000
[ 1305.843133] RDX: ffffffff81c8ea00 RSI: ffff8800b6e065e0 RDI: 0000000000000000
[ 1305.847062] RBP: ffff880119703e98 R08: 0000000000000086 R09: 0000000000000431
[ 1305.850897] R10: 0000000000000000 R11: ffff880119703c0e R12: ffff8800b6e065e0
[ 1305.854697] R13: 0000000000000000 R14: 0000000000000000 R15: 00007f0325bb6240
[ 1305.858456] FS: 00007f0325595740(0000) GS:ffff88011fa60000(0000) knlGS:0000000000000000
[ 1305.862225] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1305.866197] CR2: 0000000000000000 CR3: 00000000b6e23000 CR4: 00000000000407e0
[ 1305.866199] Stack:
[ 1305.866206] ffff8800b6e065e0 ffff880119703eb8 ffffffff8106abf2 0000000000000000
[ 1305.866211] ffff8800b6e065e0 ffff880119703ee0 ffffffff81091868 0000000000000000
[ 1305.866216] ffffffffc0a61000 0000000000000800 ffff880119703ef0 ffffffffc0a5f086
[ 1305.866217] Call Trace:
[ 1305.866232] [<ffffffff8106abf2>] __put_task_struct+0x52/0x140
[ 1305.866241] [<ffffffff81091868>] kthread_stop+0xd8/0xe0
[ 1305.866249] [<ffffffffc0a5f086>] cleanup_thread+0x23/0xf9d [kernel_thread_example]
[ 1305.866259] [<ffffffff810ebbb2>] SyS_delete_module+0x162/0x200
[ 1305.866268] [<ffffffff8176edcd>] system_call_fastpath+0x1a/0x1f
[ 1305.866318] Code: ff ff 85 c0 0f 84 33 fe ff ff e9 0c fe ff ff 90 66 66 66 66 90 55 48 89 e5 53 48 8b 87 c0 05 00 00 48 89 fb 48 8b bf b8 05 00 00 <8b> 00 48 c7 83 b8 05 00 00 00 00 00 00 f0 ff 0f 74 23 48 8b bb
[ 1305.866324] RIP [<ffffffff81096d6b>] exit_creds+0x1b/0x70
[ 1305.866326] RSP <ffff880119703e90>
[ 1305.866328] CR2: 0000000000000000
[ 1305.866378] ---[ end trace 0bd516c6629996c7 ]---
我不知道为什么会这样.
我在互联网上搜索,但找不到任何原因.
I am not able to figure why is this happening.
I searched on internet but could not find any reason.
此外,变量 current 是否已在上述任何头文件中声明,并且使用上面创建的 thread_st 有什么问题?
Also, Is the variable current already declared in any of the above headers and what is the problem with using thread_st which i have created above?
推荐答案
来自kthread_stop函数的描述:
From the description of kthread_stop function:
如果threadfn()可以调用do_exit()本身,则调用者必须确保task_struct无法消失.
If threadfn() may call do_exit() itself, the caller must ensure task_struct can't go away.
这意味着如果kthread在其他地方被kthread_stop()
终止,则不能简单地退出kthread.您应该仅在发现kthread_should_stop()
为true时退出,或者在退出前 grub对task_struct 的引用(以某种方式).
This means that you cannot simply exit from kthread if it is terminated by kthread_stop()
elsewhere. You should either exit only when found kthread_should_stop()
being true, or should grub reference to task_struct (in some way) before exit.
等待dmesg显示线程停止",这仅表示kthread_should_stop()已返回true.
Wait for the dmesg to show 'Thread Stopping', which simply means that the kthread_should_stop() has returned true.
对于signal_pending(current)
,如果没有allow_signal()
调用,则此为真.仅当有人为给定线程调用kthread_stop()
时,kthread_should_stop()
为true.如果是用户空间明确发送的信号(由于allow_signal()
),则signal_pending(current)
不会反映kthread_should_stop()
状态.
In case of signal_pending(current)
, this would be true without allow_signal()
calls. kthread_should_stop()
is true only when someone call kthread_stop()
for given thread. In case of signals, explicitely sent by user space(because of allow_signal()
), signal_pending(current)
doesn't reflect kthread_should_stop()
state.
因此,您的两个实现都不正确,因为在从使用空间明确发送信号的情况下,它们都退出线程.
So, both your implementations are incorrect, because they exit thread in case of signal explicitely sent from use space.
此外,在kthread函数中使用thread_st
会引入竞争条件:线程函数可能在kthread_run()
返回之前启动(并将其结果分配给thread_st
).
Additionally, using thread_st
in the kthread function introduces a race condition: thread function may start before kthread_run()
returns (and its result be assigned to thread_st
).
更新:
您可能要等到线程停止"之后立即调用kthreas_stop():
You may wait until kthreas_stop() will be called just after "Thread Stopping":
static int thread_fun(void *unused)
{
allow_signal(SIGKILL);
while(!kthread_should_stop())
{
printk(KERN_INFO "Thread Running\n");
ssleep(5);
if(signal_pending(current))
break;
}
printk(KERN_INFO "Thread Stopping\n");
// Wait until kthread will be actually stopped.
while(!kthread_should_stop())
{
/*
* Flush any pending signal.
*
* Otherwise interruptible wait will not wait actually.
*/
flush_signals(current);
/* Stopping thread is some sort of interrupt. That's why we need interruptible wait. */
set_current_state(TASK_INTERRUPTIBLE);
if(!kthread_should_stop()) schedule();
set_current_state(TASK_RUNNING);
}
return 0;
}
这篇关于“无法在虚拟地址处处理内核NULL指针取消引用." -在发信号通知内核模块时|面向对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!