调试在客户盒子上生成的核心文件 [英] Debugging core files generated on a Customer's box

查看:24
本文介绍了调试在客户盒子上生成的核心文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们通过在客户的机器上运行我们的软件获得核心文件.不幸的是,因为我们总是使用 -O2 调试符号进行编译,这导致我们无法弄清楚它为什么崩溃的情况,我们已经修改了构建,所以现在它们生成 -g 和 -O2 一起.然后我们建议客户运行 -g 二进制文件,以便更容易调试.

我有几个问题:

  1. 如果核心文件是从我们在 Dev 中运行的发行版以外的 Linux 发行版生成的,会发生什么?堆栈跟踪是否有意义?
  2. 有没有关于在 Linux 或 Solaris 上调试的好书?面向示例的东西会很棒.我正在寻找现实生活中的例子,以找出例程崩溃的原因以及作者如何找到解决方案.中高级水平的东西会更好,因为我已经这样做了一段时间了.一些组装也不错.

这是一个崩溃的例子,它要求我们告诉客户获取 -g 版本.二进制:

程序以信号 11 终止,分段错误.#0 0xffffe410 在 __kernel_vsyscall ()(gdb) 在哪里#0 0xffffe410 在 __kernel_vsyscall ()#1 0x00454ff1 in select () from/lib/libc.so.6...<省略帧>

理想情况下,我想找出应用程序崩溃的确切原因 - 我怀疑是内存损坏,但我不是 100% 确定.

严禁远程调试.

谢谢

解决方案

如果核心文件是从我们在 Dev 中运行的发行版以外的 Linux 发行版生成的,会发生什么?堆栈跟踪是否有意义?

如果可执行文件是动态链接的,就像你的一样,GDB 产生的堆栈(很可能)没有是有意义的.

原因:GDB 知道您的可执行文件通过在地址 0x00454ff1 处调用 libc.so.6 中的某些内容而崩溃,但它不知道那里是什么代码地址.所以它查看你的libc.so.6 副本,发现它在select 中,所以它打印出来.

但是 0x00454ff1libc.so.6customers 副本中也被选中的可能性非常小.很可能客户在该地址执行了一些其他程序,可能是abort.

你可以使用disas select,观察0x00454ff1要么在指令的中间,要么前面的指令不是CALL.如果其中任何一个成立,您的堆栈跟踪就毫无意义.

可以但是帮助自己:您只需要从客户系统获取(gdb) info shared中列出的所有库的副本.让客户用例如

cd/tar cvzf to-you.tar.gz lib/libc.so.6 lib/ld-linux.so.2 ...

然后,在您的系统上:

mkdir/tmp/from-customertar xzf to-you.tar.gz -C/tmp/from-customergdb/path/to/binary(gdb) 设置 solib-absolute-prefix/tmp/from-customer(gdb) core core #注意:在加载core之前设置solib-...很重要(gdb) where # 获取有意义的堆栈跟踪!

<块引用>

然后我们建议客户运行 -g 二进制文件,以便更容易调试.

一个更好的方法是:

  • 使用 -g -O2 -o myexe.dbg
  • 构建
  • strip -g myexe.dbg -o myexe
  • 分发myexe给客户
  • 当客户拿到core后,使用myexe.dbg调试

您将拥有完整的符号信息(文件/行、局部变量),无需向客户发送特殊的二进制文件,也不会透露有关您的源的太多详细信息.

We get core files from running our software on a Customer's box. Unfortunately because we've always compiled with -O2 without debugging symbols this has lead to situations where we could not figure out why it was crashing, we've modified the builds so now they generate -g and -O2 together. We then advice the Customer to run a -g binary so it becomes easier to debug.

I have a few questions:

  1. What happens when a core file is generated from a Linux distro other than the one we are running in Dev? Is the stack trace even meaningful?
  2. Are there any good books for debugging on Linux, or Solaris? Something example oriented would be great. I am looking for real-life examples of figuring out why a routine crashed and how the author arrived at a solution. Something more on the intermediate to advanced level would be good, as I have been doing this for a while now. Some assembly would be good as well.

Here's an example of a crash that requires us to tell the Customer to get a -g ver. of the binary:

Program terminated with signal 11, Segmentation fault.
#0  0xffffe410 in __kernel_vsyscall ()
(gdb) where
#0  0xffffe410 in __kernel_vsyscall ()
#1  0x00454ff1 in select () from /lib/libc.so.6
...
<omitted frames>

Ideally I'd like to solve find out why exactly the app crashed - I suspect it's memory corruption but I am not 100% sure.

Remote debugging is strictly not allowed.

Thanks

解决方案

What happens when a core file is generated from a Linux distro other than the one we are running in Dev? Is the stack trace even meaningful?

It the executable is dynamically linked, as yours is, the stack GDB produces will (most likely) not be meaningful.

The reason: GDB knows that your executable crashed by calling something in libc.so.6 at address 0x00454ff1, but it doesn't know what code was at that address. So it looks into your copy of libc.so.6 and discovers that this is in select, so it prints that.

But the chances that 0x00454ff1 is also in select in your customers copy of libc.so.6 are quite small. Most likely the customer had some other procedure at that address, perhaps abort.

You can use disas select, and observe that 0x00454ff1 is either in the middle of instruction, or that the previous instruction is not a CALL. If either of these holds, your stack trace is meaningless.

You can however help yourself: you just need to get a copy of all libraries that are listed in (gdb) info shared from the customer system. Have the customer tar them up with e.g.

cd /
tar cvzf to-you.tar.gz lib/libc.so.6 lib/ld-linux.so.2 ...

Then, on your system:

mkdir /tmp/from-customer
tar xzf to-you.tar.gz -C /tmp/from-customer
gdb /path/to/binary
(gdb) set solib-absolute-prefix /tmp/from-customer
(gdb) core core  # Note: very important to set solib-... before loading core
(gdb) where      # Get meaningful stack trace!

We then advice the Customer to run a -g binary so it becomes easier to debug.

A much better approach is:

  • build with -g -O2 -o myexe.dbg
  • strip -g myexe.dbg -o myexe
  • distribute myexe to customers
  • when a customer gets a core, use myexe.dbg to debug it

You'll have full symbolic info (file/line, local variables), without having to ship a special binary to the customer, and without revealing too many details about your sources.

这篇关于调试在客户盒子上生成的核心文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆