内核冻结:如何调试? [英] Kernel freeze : How to debug it?

查看:99
本文介绍了内核冻结:如何调试?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有数千行内核模块的嵌入式板,该模块冻结在随机和复杂的用例上,并具有随机的时间.我尝试调试的解决方案是什么?

I have an embedded board with a kernel module of thousands of lines which freeze on random and complexe use case with random time. What are the solution for me to try to debug it ?

我已经尝试了不可思议的System Request,但是它不起作用.我想这可能是因为我处于禁用硬件中断的代码的循环或死锁中?

I have already try magic System Request but it does not work. I guess that the explanation is that I am in a loop or a deadlock in a code where hardware interrupt is disable ?

谢谢, eva.

推荐答案

通常,嵌入式板具有 watchdog 用户进程来启动看门狗硬件.在watchdog进程上使用nice,以便更高优先级的任务必须放弃CPU.这提供了有关该问题的线索.如果设备没有在激活看门狗的情况下重置,则可能是只有网络串行端口停止了通信.即,内核尚未锁定.问题是没有用户可见的活动.如果/在现场发生此类问题,看门狗也很有用.

Typically, embedded boards have a watch dog. You should enable this timer and use the watchdog user process to kick the watch dog hard ware. Use nice on the watchdog process so that higher priority tasks must relinquish the CPU. This gives clues as to the issue. If the device does not reset with a watch dog active, then it maybe that only the network or serial port has stopped communicating. Ie, the kernel has not locked up. The issue is that there is no user visible activity. The watch dog is also useful if/when this type of issue occurs in the field.

对于内核锁定情况,

For a kernel lockup case, the lockup watchdogs kernel features maybe useful. This will work if you have an infinite loop/deadlock as speculated. However, if this is custom hardware, it is also possible that SDRAM or a peripheral device latches up and causes abnormal bus activity. This will stop the CPU from fetching proper code; obviously, it is tough for Linux to recover from this.

您可以将看门狗与用作跟踪缓冲区的某些 fallow memory 结合使用. memmap=mem=可以限制内核使用的内存.可以编写使用此内存的驱动程序/设备,以保存跟踪点,这些跟踪点在重启后仍有效.当在内核启动时检测到看门狗重置时,空闲内存的环形缓冲区将被转储.

You can combine the watchdog with some fallow memory that is used as a trace buffer. memmap= and mem= can limit the memory used by the kernel. A driver/device using this memory can be written that saves trace points that survive a reboot. The fallow memory's ring buffer is dumped when a watchdog reset is detected on kernel boot.

注册线程printk的="nofollow">通知程序,如果问题是可重复的或发现如何使事件可重复.确定导致锁定的事件序列后,可以使用作用域逻辑分析器进行一些最终诊断.或者,这可能是目前问题所在.

It is also useful to register thread notifiers that can do a printk on context switches, if the issue is repeatable or to discover how to make the event repeatable. Once you determine a sequence of events that leads to the lockup, you can use the scope or logic analyzer to do some final diagnosis. Or, it maybe evident which peripheral is the issue at this point.

您还可以在内核命令行上设置panic=-1reboot=.... kdump 功能非常有用,如果您仅遇到代码问题.

You may also set panic=-1 and reboot=... on the kernel command line. The kdump facilities are useful, if you only have a code problem.

相关:内核陷阱(在Web存档中).该链接可能不再可用,但对于此答案并不重要.

Related: kernel trap (at web archive). This link may no longer be available, but aren't important to this answer.

这篇关于内核冻结:如何调试?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆