调试在客户机箱上生成的核心文件 [英] Debugging core files generated on a Customer's box

查看:91
本文介绍了调试在客户机箱上生成的核心文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们从客户的盒子中获取运行我们的软件的核心文件。不幸的是,因为我们一直使用-O2 编译没有调试符号,这导致了我们无法弄清楚为什么崩溃的情况,我们已经修改了构建,现在他们生成-g和 - O2在一起然后我们建议客户运行一个-g二进制文件,以便调试变得更容易。

We get core files from running our software on a Customer's box. Unfortunately because we've always compiled with -O2 without debugging symbols this has lead to situations where we could not figure out why it was crashing, we've modified the builds so now they generate -g and -O2 together. We then advice the Customer to run a -g binary so it becomes easier to debug.

我有几个问题:


  1. 当从Linux发行版生成核心文件,而不是我们在Dev中运行的文件时,会发生什么?堆栈跟踪是否也有意义?

  2. 在Linux或Solaris上是否有调试的好书?一些面向事例将是巨大的。我正在寻找一个真实的例子来弄清楚为什么一个例程崩溃,以及作者如何到达解决方案。在中级到高级之间的事情会更好一些,因为我已经做了一段时间了。一些装配也会很好。

这是一个崩溃的例子,要求我们告诉客户获得-g版。的二进制:

Here's an example of a crash that requires us to tell the Customer to get a -g ver. of the binary:

Program terminated with signal 11, Segmentation fault.
#0  0xffffe410 in __kernel_vsyscall ()
(gdb) where
#0  0xffffe410 in __kernel_vsyscall ()
#1  0x00454ff1 in select () from /lib/libc.so.6
...
<omitted frames>

理想情况下,我想解决为什么确切的应用程序崩溃 - 我怀疑它是内存损坏,我不是100%确定。

Ideally I'd like to solve find out why exactly the app crashed - I suspect it's memory corruption but I am not 100% sure.

严格不允许远程调试。

谢谢

推荐答案


当从Linux发行版生成核心文件,而不是我们在Dev中运行时,会发生什么?堆栈跟踪是否有意义?

What happens when a core file is generated from a Linux distro other than the one we are running in Dev? Is the stack trace even meaningful?

它的可执行文件是动态链接的,就像你的那样,GDB生成的堆栈(很可能)

It the executable is dynamically linked, as yours is, the stack GDB produces will (most likely) not be meaningful.

原因:GDB知道您的可执行文件通过调用 libc.so.6中的某些内容而崩溃在地址 0x00454ff1 ,但它不知道该地址的代码。因此,它会查看 c $ c> libc.so.6 的副本,并发现这是在中选择,所以打印出来。

The reason: GDB knows that your executable crashed by calling something in libc.so.6 at address 0x00454ff1, but it doesn't know what code was at that address. So it looks into your copy of libc.so.6 and discovers that this is in select, so it prints that.

但是,您的中也可以选择 0x00454ff1 客户副本 libc.so.6 相当小。很可能客户在该地址处有其他一些程序,也许 abort

But the chances that 0x00454ff1 is also in select in your customers copy of libc.so.6 are quite small. Most likely the customer had some other procedure at that address, perhaps abort.

您可以使用 disas选择,并观察 0x00454ff1 是在指令的中间,或者以前的指令不是 CALL 。如果其中任何一个保持,您的堆栈跟踪是无意义的。

You can use disas select, and observe that 0x00454ff1 is either in the middle of instruction, or that the previous instruction is not a CALL. If either of these holds, your stack trace is meaningless.

您可以可以帮助您:只需要获取所有库的副本它们在客户系统的(gdb)信息共享中列出。让客户用他们的方式,例如

You can however help yourself: you just need to get a copy of all libraries that are listed in (gdb) info shared from the customer system. Have the customer tar them up with e.g.

cd /
tar cvzf to-you.tar.gz lib/libc.so.6 lib/ld-linux.so.2 ...

然后,在你的系统上:

mkdir /tmp/from-customer
tar xzf to-you.tar.gz -C /tmp/from-customer
gdb /path/to/binary
(gdb) set solib-absolute-prefix /tmp/from-customer
(gdb) core core  # Note: very important to set solib-... before loading core
(gdb) where      # Get meaningful stack trace!




然后,我们建议客户运行一个-g二进制文件,更容易调试。

We then advice the Customer to run a -g binary so it becomes easier to debug.

更好的方法是:


  • 构建与 -g -O2 -o myexe.dbg

  • cp myexe.dbg myexe

  • strip -g myexe

  • 当客户获得核心 myexe 分发给客户

  • >,使用 myexe.dbg 进行调试

  • build with -g -O2 -o myexe.dbg
  • cp myexe.dbg myexe
  • strip -g myexe
  • distribute myexe to customers
  • when a customer gets a core, use myexe.dbg to debug it

符号信息(文件/行,局部变量),而不必向客户发送特殊的二进制文件,而不会透露太多关于您来源的详细信息。

You'll have full symbolic info (file/line, local variables), without having to ship a special binary to the customer, and without revealing too many details about your sources.

这篇关于调试在客户机箱上生成的核心文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆