单元测试大数据集? [英] Unit testing large data sets?

查看:33
本文介绍了单元测试大数据集?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对大型数据集进行单元测试的最佳方法是什么?我正在维护的一些遗留代码具有一百个或更多成员的结构;我们正在处理的代码的其他部分创建或分析数百个样本的数据集.

What's the best way to unit test large data sets? Some legacy code that I'm maintaining has structures of a hundred members or more; other parts of the code that we're working on create or analyze data sets of hundreds of samples.

到目前为止,我发现的最佳方法是从磁盘序列化结构或数据集,执行被测操作,将结果序列化到磁盘,然后将包含序列化结果的文件与包含预期结果的文件进行比较.这并不是非常快,并且违反了单元测试的不接触磁盘"原则.然而,我能想到的唯一替代方法(编写代码来初始化和测试数百个成员和数据点)似乎非常乏味.

The best approach I've found so far is to serialize the structures or data sets from disk, perform the operations under test, serialize the results to disk, then diff the files containing the serialized results against files containing expected results. This isn't terribly fast, and it violates the "don't touch the disk" principle of unit testing. However, the only alternative I can think of (writing code to initialize and test hundreds of members and data points) seems unbearably tedious.

有没有更好的解决方案?

Are there any better solutions?

推荐答案

如果您想要实现的实际上是单元测试,您应该模拟底层数据结构并模拟数据.这种技术使您可以完全控制输入.例如,您编写的每个测试可能处理单个数据点,并且您将针对每个条件拥有一组非常简洁的测试.有几个开源模拟框架,我个人推荐 Rhino Mocks (http://ayende.com/projects/rhino-mocks/downloads.aspx) 或 NMock (http://www.nmock.org).

If what you are trying to achieve is, in fact, a unit test you should mock out the underlying data structures and simulate the data. This technique gives you complete control over the inputs. For example, each test you write may handle a single data point and you'll have a very concise set of tests for each condition. There are several open source mocking frameworks out there, I personally recommend Rhino Mocks (http://ayende.com/projects/rhino-mocks/downloads.aspx) or NMock (http://www.nmock.org).

如果您无法模拟数据结构,我建议您进行重构,以便您能够:-) 值得!或者您可能还想尝试允许模拟的 TypeMock (http://www.typemock.com/)具体类.

If it is not possible for you to mock out the data structures I recommend refactoring so you are able to :-) Its worth it! Or you may also want to try TypeMock (http://www.typemock.com/) which allows mocking of concrete classes.

但是,如果您针对大型数据集进行测试,那么您实际上是在运行功能测试而不是单元测试.在这种情况下,将数据加载到数据库或从磁盘是一种典型的操作.与其避免它,您应该努力让它与自动化构建过程的其余部分并行运行,这样性能影响就不会阻碍您的任何开发人员.

If, however, if you're doing tests against large data sets you're really running functional tests not unit tests. In which case loading data into a database or from disk is a typical operation. Rather than avoid it you should work on getting it running in parallel with the rest of your automated build process so the performance impact isn't holding any of your developers up.

这篇关于单元测试大数据集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆