如何调试仅在负载巨大时才会出现的错误? [英] How do you debug the bug that only appears when the load is huge?

查看:72
本文介绍了如何调试仅在负载巨大时才会出现的错误?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们当前正在使用C语言开发集群管理器软件。如果有多个节点连接到管理器,则可以正常工作,但是如果我们使用一些工具来模拟1000个节点来连接管理器,则有时它会以意外的方式工作。



如何调试这种错误?



如果我使用 gdb 逐步调试,它只会在负载(连接/节点)很大时显示吗?

解决方案


如何调试这种错误?


通常,您至少要使用以下技术:


  1. 确保代码编译和链接时没有警告。 -Wall 是一个好的开始,但 -Wextra 更好。

  2. 确保应用程序具有内置的日志记录和跟踪功能,可以打开或关闭它们,并具有足够的详细信息来调试此类问题,并且开销较低。

  3. 确保该代码具有良好的单元测试覆盖率。

  4. 确保测试是清洁的。



< blockquote>

在valgrind检查中也没有警告。


目前尚不清楚您是否只是运行了Valgrind下的目标应用程序,或者您是否还具有单元测试,并且这些测试是Valgrind-clean的。还不清楚您是否在Valgrind下观察到应用程序的异常行为。



Valgrind曾经是解决堆和非初始化内存问题的最佳工具,但是在2017年,情况不再如此。



基于编译器的地址线程内存消毒剂捕获的错误类别大得多(例如,全局和堆栈溢出以及数据竞争),您应该



当以上所有方法仍然找不到问题时,您也许可以运行装有消毒剂的真实应用程序。 / p>

最后,还有 GDB跟踪 systemtap -它们较难学习,但可以赋予您强大的功能。概述此处


We are currently developing a cluster manager software in C. If several nodes connect to the manager, it works perfect, but if we use some tools to simulate 1000 nodes to connect the manager, it will sometimes work in unexpected ways.

How can one debug this kind of bug? It only appears when the load(connection/nodes) is large?

If I use gdb to debug step by step, the app never malfunctions.

解决方案

How to debug this kind of bug?

In general, you want to use at least these techniques:

  1. Make sure the code compiles and links without warnings. The -Wall is a good start, but -Wextra is better.
  2. Make sure the application has designed-in logging and tracing, which can be turned on or off, and which has sufficient details to debug these kinds of issues, and low overhead.
  3. Make sure the code has good unit-test coverage.
  4. Make sure the tests are sanitizer-clean.

there's also no warning in valgrind check.

It's not clear whether you've simply ran the target application under Valgrind, or whether you also have the unit tests, and the tests are Valgrind-clean. It's also not clear whether you've observed the application mis-behavior under Valgrind or not.

Valgrind used to be the best tool available for heap and unintialized memory problems, but in 2017 this is no longer the case.

Compiler-based Address, Thread and Memory sanitizers catch significantly wider class of errors (e.g. global and stack overflows, and data races), and you should run your unit tests under all of them.

When all of the above still fails to find the problem, you may be able to run the real application instrumented with sanitizers.

Lastly, there are tools like GDB tracing and systemtap -- they are harder to learn, but give you significant power. Overview here.

这篇关于如何调试仅在负载巨大时才会出现的错误?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆