调试客户盒子上生成的核心文件 [英] Debugging core files generated on a Customer's box
问题描述
我们通过在客户的盒子上运行我们的软件来获取核心文件.不幸的是,因为我们总是使用 -O2 调试符号进行编译,这导致我们无法弄清楚它为什么崩溃的情况,我们已经修改了构建,所以现在它们生成 -g 和 -O2 在一起.然后,我们建议客户运行 -g 二进制文件,以便更容易调试.
我有几个问题:
- 如果核心文件是从我们在 Dev 中运行的 Linux 发行版以外的发行版生成的,会发生什么情况?堆栈跟踪是否有意义?
- 有没有关于在 Linux 或 Solaris 上进行调试的好书?面向示例的东西会很棒.我正在寻找现实生活中的示例,以找出例程崩溃的原因以及作者如何得出解决方案.中级到高级水平的东西会很好,因为我已经这样做了一段时间了.一些组装也很好.
这是一个需要我们告诉客户获取 -g 版本的崩溃示例.二进制文件:
程序以信号 11 终止,分段错误.#0 0xffffe410 在 __kernel_vsyscall ()(gdb) 在哪里#0 0xffffe410 在 __kernel_vsyscall ()#1 0x00454ff1 in select () from/lib/libc.so.6...<省略的帧>
理想情况下,我想找出应用程序崩溃的确切原因 - 我怀疑是内存损坏,但我不是 100% 确定.
严禁远程调试.
谢谢
如果核心文件是从我们在 Dev 中运行的 Linux 发行版以外的发行版生成的,会发生什么?堆栈跟踪是否有意义?
如果可执行文件是动态链接的,就像你的一样,GDB 生成的堆栈(很可能)不有意义.
原因:GDB 知道您的可执行文件通过在地址 0x00454ff1
调用 libc.so.6
中的某些内容而崩溃,但它不知道那是什么代码地址.因此,它查看 您的 libc.so.6
副本并发现它在 select
中,因此它打印出来.
但是,0x00454ff1
也在 customers 的 libc.so.6
副本中被选中的可能性非常小.客户很可能在该地址有其他程序,可能是 abort
.
你可以使用disas select
,观察0x00454ff1
要么在指令中间,要么前面的指令不是CALL
.如果其中任何一个成立,则您的堆栈跟踪毫无意义.
您可以帮助自己:您只需要从客户系统获取(gdb) info shared
中列出的所有库的副本.让客户用例如 tar 焦油
cd/tar cvzf to-you.tar.gz lib/libc.so.6 lib/ld-linux.so.2 ...
然后,在您的系统上:
mkdir/tmp/from-customertar xzf to-you.tar.gz -C/tmp/from-customergdb/path/to/binary(gdb) 设置 solib-absolute-prefix/tmp/from-customer(gdb) core core # 注意:在加载 core 之前设置 solib-... 非常重要(gdb) where # 获取有意义的堆栈跟踪!
<块引用>
然后我们建议客户运行 -g 二进制文件,以便更容易调试.
更好的方法是:
- 使用
-g -O2 -o myexe.dbg
构建 strip -g myexe.dbg -o myexe
- 向客户分发
myexe
- 当客户获得
core
时,使用myexe.dbg
进行调试
您将获得完整的符号信息(文件/行、局部变量),而无需向客户发送特殊的二进制文件,也无需透露有关您的来源的太多细节.
We get core files from running our software on a Customer's box. Unfortunately because we've always compiled with -O2 without debugging symbols this has lead to situations where we could not figure out why it was crashing, we've modified the builds so now they generate -g and -O2 together. We then advice the Customer to run a -g binary so it becomes easier to debug.
I have a few questions:
- What happens when a core file is generated from a Linux distro other than the one we are running in Dev? Is the stack trace even meaningful?
- Are there any good books for debugging on Linux, or Solaris? Something example oriented would be great. I am looking for real-life examples of figuring out why a routine crashed and how the author arrived at a solution. Something more on the intermediate to advanced level would be good, as I have been doing this for a while now. Some assembly would be good as well.
Here's an example of a crash that requires us to tell the Customer to get a -g ver. of the binary:
Program terminated with signal 11, Segmentation fault.
#0 0xffffe410 in __kernel_vsyscall ()
(gdb) where
#0 0xffffe410 in __kernel_vsyscall ()
#1 0x00454ff1 in select () from /lib/libc.so.6
...
<omitted frames>
Ideally I'd like to solve find out why exactly the app crashed - I suspect it's memory corruption but I am not 100% sure.
Remote debugging is strictly not allowed.
Thanks
What happens when a core file is generated from a Linux distro other than the one we are running in Dev? Is the stack trace even meaningful?
It the executable is dynamically linked, as yours is, the stack GDB produces will (most likely) not be meaningful.
The reason: GDB knows that your executable crashed by calling something in libc.so.6
at address 0x00454ff1
, but it doesn't know what code was at that address. So it looks into your copy of libc.so.6
and discovers that this is in select
, so it prints that.
But the chances that 0x00454ff1
is also in select in your customers copy of libc.so.6
are quite small. Most likely the customer had some other procedure at that address, perhaps abort
.
You can use disas select
, and observe that 0x00454ff1
is either in the middle of instruction, or that the previous instruction is not a CALL
. If either of these holds, your stack trace is meaningless.
You can however help yourself: you just need to get a copy of all libraries that are listed in (gdb) info shared
from the customer system. Have the customer tar them up with e.g.
cd /
tar cvzf to-you.tar.gz lib/libc.so.6 lib/ld-linux.so.2 ...
Then, on your system:
mkdir /tmp/from-customer
tar xzf to-you.tar.gz -C /tmp/from-customer
gdb /path/to/binary
(gdb) set solib-absolute-prefix /tmp/from-customer
(gdb) core core # Note: very important to set solib-... before loading core
(gdb) where # Get meaningful stack trace!
We then advice the Customer to run a -g binary so it becomes easier to debug.
A much better approach is:
- build with
-g -O2 -o myexe.dbg
strip -g myexe.dbg -o myexe
- distribute
myexe
to customers - when a customer gets a
core
, usemyexe.dbg
to debug it
You'll have full symbolic info (file/line, local variables), without having to ship a special binary to the customer, and without revealing too many details about your sources.
这篇关于调试客户盒子上生成的核心文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!