调试客户盒子上生成的核心文件 [英] Debugging core files generated on a Customer's box

查看:20
本文介绍了调试客户盒子上生成的核心文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们通过在客户的盒子上运行我们的软件来获取核心文件.不幸的是,因为我们总是使用 -O2 调试符号进行编译,这导致我们无法弄清楚它为什么崩溃的情况,我们已经修改了构建,所以现在它们生成 -g 和 -O2 在一起.然后,我们建议客户运行 -g 二进制文件,以便更容易调试.

我有几个问题:

  1. 如果核心文件是从我们在 Dev 中运行的 Linux 发行版以外的发行版生成的,会发生什么情况?堆栈跟踪是否有意义?
  2. 有没有关于在 Linux 或 Solaris 上进行调试的好书?面向示例的东西会很棒.我正在寻找现实生活中的示例,以找出例程崩溃的原因以及作者如何得出解决方案.中级到高级水平的东西会很好,因为我已经这样做了一段时间了.一些组装也很好.

这是一个需要我们告诉客户获取 -g 版本的崩溃示例.二进制文件:

程序以信号 11 终止,分段错误.#0 0xffffe410 在 __kernel_vsyscall ()(gdb) 在哪里#0 0xffffe410 在 __kernel_vsyscall ()#1 0x00454ff1 in select () from/lib/libc.so.6...<省略的帧>

理想情况下,我想找出应用程序崩溃的确切原因 - 我怀疑是内存损坏,但我不是 100% 确定.

严禁远程调试.

谢谢

解决方案

如果核心文件是从我们在 Dev 中运行的 Linux 发行版以外的发行版生成的,会发生什么?堆栈跟踪是否有意义?

如果可执行文件是动态链接的,就像你的一样,GDB 生成的堆栈(很可能)有意义.

原因:GDB 知道您的可执行文件通过在地址 0x00454ff1 调用 libc.so.6 中的某些内容而崩溃,但它不知道那是什么代码地址.因此,它查看 您的 libc.so.6 副本并发现它在 select 中,因此它打印出来.

但是,0x00454ff1 也在 customerslibc.so.6 副本中被选中的可能性非常小.客户很可能在该地址有其他程序,可能是 abort.

你可以使用disas select,观察0x00454ff1要么在指令中间,要么前面的指令不是CALL.如果其中任何一个成立,则您的堆栈跟踪毫无意义.

可以帮助自己:您只需要从客户系统获取(gdb) info shared 中列出的所有库的副本.让客户用例如 tar 焦油

cd/tar cvzf to-you.tar.gz lib/libc.so.6 lib/ld-linux.so.2 ...

然后,在您的系统上:

mkdir/tmp/from-customertar xzf to-you.tar.gz -C/tmp/from-customergdb/path/to/binary(gdb) 设置 solib-absolute-prefix/tmp/from-customer(gdb) core core # 注意:在加载 core 之前设置 solib-... 非常重要(gdb) where # 获取有意义的堆栈跟踪!

<块引用>

然后我们建议客户运行 -g 二进制文件,以便更容易调试.

更好的方法是:

  • 使用 -g -O2 -o myexe.dbg
  • 构建
  • strip -g myexe.dbg -o myexe
  • 向客户分发myexe
  • 当客户获得core时,使用myexe.dbg进行调试

您将获得完整的符号信息(文件/行、局部变量),而无需向客户发送特殊的二进制文件,也无需透露有关您的来源的太多细节.

We get core files from running our software on a Customer's box. Unfortunately because we've always compiled with -O2 without debugging symbols this has lead to situations where we could not figure out why it was crashing, we've modified the builds so now they generate -g and -O2 together. We then advice the Customer to run a -g binary so it becomes easier to debug.

I have a few questions:

  1. What happens when a core file is generated from a Linux distro other than the one we are running in Dev? Is the stack trace even meaningful?
  2. Are there any good books for debugging on Linux, or Solaris? Something example oriented would be great. I am looking for real-life examples of figuring out why a routine crashed and how the author arrived at a solution. Something more on the intermediate to advanced level would be good, as I have been doing this for a while now. Some assembly would be good as well.

Here's an example of a crash that requires us to tell the Customer to get a -g ver. of the binary:

Program terminated with signal 11, Segmentation fault.
#0  0xffffe410 in __kernel_vsyscall ()
(gdb) where
#0  0xffffe410 in __kernel_vsyscall ()
#1  0x00454ff1 in select () from /lib/libc.so.6
...
<omitted frames>

Ideally I'd like to solve find out why exactly the app crashed - I suspect it's memory corruption but I am not 100% sure.

Remote debugging is strictly not allowed.

Thanks

解决方案

What happens when a core file is generated from a Linux distro other than the one we are running in Dev? Is the stack trace even meaningful?

It the executable is dynamically linked, as yours is, the stack GDB produces will (most likely) not be meaningful.

The reason: GDB knows that your executable crashed by calling something in libc.so.6 at address 0x00454ff1, but it doesn't know what code was at that address. So it looks into your copy of libc.so.6 and discovers that this is in select, so it prints that.

But the chances that 0x00454ff1 is also in select in your customers copy of libc.so.6 are quite small. Most likely the customer had some other procedure at that address, perhaps abort.

You can use disas select, and observe that 0x00454ff1 is either in the middle of instruction, or that the previous instruction is not a CALL. If either of these holds, your stack trace is meaningless.

You can however help yourself: you just need to get a copy of all libraries that are listed in (gdb) info shared from the customer system. Have the customer tar them up with e.g.

cd /
tar cvzf to-you.tar.gz lib/libc.so.6 lib/ld-linux.so.2 ...

Then, on your system:

mkdir /tmp/from-customer
tar xzf to-you.tar.gz -C /tmp/from-customer
gdb /path/to/binary
(gdb) set solib-absolute-prefix /tmp/from-customer
(gdb) core core  # Note: very important to set solib-... before loading core
(gdb) where      # Get meaningful stack trace!

We then advice the Customer to run a -g binary so it becomes easier to debug.

A much better approach is:

  • build with -g -O2 -o myexe.dbg
  • strip -g myexe.dbg -o myexe
  • distribute myexe to customers
  • when a customer gets a core, use myexe.dbg to debug it

You'll have full symbolic info (file/line, local variables), without having to ship a special binary to the customer, and without revealing too many details about your sources.

这篇关于调试客户盒子上生成的核心文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆