如何检测和调试多线程问题? [英] How to detect and debug multi-threading problems?

查看:183
本文介绍了如何检测和调试多线程问题?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是这个问题,在这里我没有得到任何意见。这是一个简短的问题:

This is a follow up to this question, where I didn't get any input on this point. Here is the brief question:

是否可以检测和调试来自多线程代码的问题?

通常我们必须告诉我们的客户:我们无法在这里重现问题,所以我们无法解决问题,请告诉我们重现问题的步骤,然后我们解决。如果我知道这是一个多线程问题,那么这是一个令人讨厌的答案,但大多数情况下我没有。我如何知道一个问题是一个多线程问题和如何调试?

Often we have to tell our customers: "We can't reproduce the problem here, so we can't fix it. Please tell us the steps to reproduce the problem, then we'll fix it." It's a somehow nasty answer if I know that it is a multi-threading problem, but mostly I don't. How do I get to know that a problem is a multi-threading issue and how to debug it?

我想知道是否有任何特殊的日志框架,或调试技术,代码检查员或其他任何帮助解决这些问题的事情。欢迎一般办法。如果任何答案应该是语言相关的,那么将其保留到.NET和Java。

I'd like to know if there are any special logging frameworks, or debugging techniques, or code inspectors, or anything else to help solving such issues. General approaches are welcome. If any answer should be language related then keep it to .NET and Java.

推荐答案

线程/并发问题<强>难以复制 - 这是为什么你应该设计避免或至少最小化概率的原因之一。这是不可变物体如此宝贵的原因。尝试将可变对象隔离为单个线程,然后仔细控制线程之间可变对象的交换。尝试用对象切换的设计进行编程,而不是共享对象。对于后者,使用完全同步的控制对象(这更容易理解),并避免使同步对象利用必须同步的其他对象 - 也就是试图保持它们自包含。你最好的防御是一个很好的设计。

Threading/concurrency problems are notoriously difficult to replicate - which is one of the reasons why you should design to avoid or at least minimize the probabilities. This is the reason immutable objects are so valuable. Try to isolate mutable objects to a single thread, and then carefully control the exchange of mutable objects between threads. Attempt to program with a design of object hand-over, rather than "shared" objects. For the latter, use fully synchronized control objects (which are easier to reason about), and avoid having a synchronized object utilize other objects which must also be synchronized - that is, try to keep them self contained. Your best defense is a good design.

死锁是最容易调试的,如果你可以在死锁时获得堆栈跟踪。鉴于这些痕迹,其中大部分都会发生死锁检测,因此很容易找出原因,然后解释代码的原因以及为什么以及如何解决这个问题。遇到僵局,它总是会以不同的顺序获取相同的锁定。

Deadlocks are the easiest to debug, if you can get a stack trace when deadlocked. Given the trace, most of which do deadlock detection, it's easy to pinpoint the reason and then reason about the code as to why and how to fix it. With deadlocks, it always going to be a problem acquiring the same locks in different orders.

活锁更难 - 能够观察系统,而错误状态是你最好的选择。

Live locks are harder - being able to observe the system while in the error state is your best bet there.

竞争条件往往难以复制,甚至更难识别从手动代码审查。有了这些,我通常采取的路径除了广泛的复制测试之外,是对可能性的推理,并尝试记录信息来证明或反驳理论。如果您有直接的国家腐败证据,您可以根据腐败原因推理可能的原因。

Race conditions tend to be extremely difficult to replicate, and are even harder to identify from manual code review. With these, the path I usually take, besides extensive testing to replicate, is to reason about the possibilities, and try to log information to prove or disprove theories. If you have direct evidence of state corruption you may be able to reason about the possible causes based on the corruption.

系统越复杂,越难找到并发错误,并对其行为进行推理。使用JVisualVM和远程连接分析器等工具,如果您可以连接到错误状态的系统并检查线程和对象,那么它们可以是一个救命员。

The more complex the system, the harder it is to find concurrency errors, and to reason about it's behavior. Make use of tools like JVisualVM and remote connect profilers - they can be a life saver if you can connect to a system in an error state and inspect the threads and objects.

此外,请注意可能的行为与CPU内核数量,流水线,总线带宽等的差异。硬件更改可能会影响您复制问题的能力。一些问题只会在单核CPU的其他人才在多内核上显示。

Also, beware the differences in possible behavior which are dependent on the number of CPU cores, pipelines, bus bandwidth, etc. Changes in hardware can affect your ability to replicate the problem. Some problems will only show on single-core CPU's others only on multi-cores.

最后一件事,尝试使用与系统库分布的并发对象 - 例如在Java java.util.concurrent 是您的朋友。编写自己的并发控制对象是很困难的,充满危险;如果您有选择,请留给专家。

One last thing, try to use concurrency objects distributed with the system libraries - e.g in Java java.util.concurrent is your friend. Writing your own concurrency control objects is hard and fraught with danger; leave it to the experts, if you have a choice.

这篇关于如何检测和调试多线程问题?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆