单元测试实时/并发软件 [英] Unit testing real-time / concurrent software

查看:168
本文介绍了单元测试实时/并发软件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述


可能重复:

如何单元测试线程代码?

经典的单元测试基本上只是把x放入并期待y,并自动化这个过程。因此,测试任何不涉及时间的任何东西都是好的。 但是,然后,我遇到的大多数非平凡的错误都与时间有关。线程损坏对方的数据,或导致死锁。非确定性行为发生 - 一百万次。硬的东西。

The classical unit testing is basically just putting x in and expecting y out, and automating that process. So it's good for testing anything that doesn't involve time. But then, most of the nontrivial bugs I've come across have had something to do with timing. Threads corrupt each others' data, or cause deadlocks. Nondeterministic behavior happens – in one run out of million. Hard stuff.

有没有什么有用的多线程,并发系统的单元测试部分?这样的测试如何工作?是否有必要长时间运行这种测试的主题,并以一些聪明的方式改变环境,以便有理由相信它能正确工作?

Is there anything useful out there for "unit testing" parts of multithreaded, concurrent systems? How do such tests work? Isn't it necessary to run the subject of such test for a long time and vary the environment in some clever manner, to become reasonably confident that it works correctly?

推荐答案

我今天做的大部分工作涉及多线程和/或分布式系统。大多数错误涉及发生先于类型错误,其中开发人员假设事件A将总是在事件B之前发生。但是每当运行程序的第1000000次时,事件B首先发生,这导致不可预测的行为。

Most of the work I do these days involves multi-threaded and/or distributed systems. The majority of bugs involve "happens-before" type errors, where the developer assumes (wrongly) that event A will always happen before event B. But every 1000000th time the program is run, event B happens first, and this causes unpredictable behavior.

此外,没有什么好的工具来检测时序问题,甚至由于竞争条件导致的数据损坏。 Valgrind工具包中的Helgrind和drd等工具对于简单的程序非常有用,但它们在诊断大型复杂系统时并不非常有用。一方面,他们经常报告假阳性(特别是Helgrind)。另一方面,在Helgrind / drd下运行时,很难实际检测某些错误,因为在Helgrind下运行的程序运行速度几乎减慢了1000倍,并且您经常需要运行一个程序很长时间才能重现 >竞态条件。另外,由于在Helgrind下运行完全改变程序的定时,所以可能变得不可能再现某个定时问题。这是微妙的时间问题的问题;他们几乎是海森堡的意思,改变程序来检测时序问题可能会掩盖原来的问题。

Additionally, there aren't really any good tools to detect timing issues, or even data corruption caused by race conditions. Tools like Helgrind and drd from the Valgrind toolkit work great for trivial programs, but they are not very useful in diagnosing large, complex systems. For one thing, they report false positives quite frequently (Helgrind especially). For another thing, it's difficult to actually detect certain errors while running under Helgrind/drd simply because programs running under Helgrind run almost 1000x slower, and you often need to run a program for quite a long time to even reproduce the race condition. Additionally, since running under Helgrind totally changes the timing of the program, it may become impossible to reproduce a certain timing issue. That's the problem with subtle timing issues; they're almost Heisenbergian in the sense that altering a program to detect timing issues may obscure the original issue.

可悲的事实是,人类还没有充分准备好处理复杂的,并发的软件。所以不幸的是,没有简单的方法来对它进行单元测试。对于分布式系统,您应该谨慎使用 Lamport的发生前图,在您的程序中的事件的必要顺序。但最终,你不能真正摆脱随机变化输入的暴力单位测试。它还有助于在您的单元测试期间改变线程上下文切换的频率,例如,运行另一个后台进程,它只占用CPU周期。此外,如果您可以访问群集,您可以并行运行多个单元测试,这可以更快地检测错误,并节省大量的时间。

The sad fact is, the human race still isn't adequately prepared to deal with complex, concurrent software. So unfortunately, there's no easy way to unit-test it. For distributed systems especially, you should plan your program carefully using Lamport's happens-before diagrams to help you identify the necessary order of events in your program. But ultimately, you can't really get away from brute-force unit testing with randomly varying inputs. It also helps to vary the frequency of thread context-switching during your unit-test by, e.g. running another background process which just takes up CPU cycles. Also, if you have access to a cluster, you can run multiple unit-tests in parallel, which can detect bugs much quicker and save you a lot of time.

这篇关于单元测试实时/并发软件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆