对于编写有意义的基准测试,您能给我什么建议? [英] What advice can you give me for writing a meaningful benchmark?

查看:65
本文介绍了对于编写有意义的基准测试,您能给我什么建议?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我开发了一个框架,供组织中的多个团队使用.这些在此框架之上开发的模块"的行为可能完全不同,但它们都是相当消耗资源的,即使其中一些比其他模块更多.它们都接收输入中的数据,对其进行分析和/或转换,然后进一步发送.

I have developed a framework that is used by several teams in our organisation. Those "modules", developed on top of this framework, can behave quite differently but they are all pretty resources consuming even though some are more than others. They all receive data in input, analyse and/or transform it, and send it further.

我们计划购买新硬件,老板要求我根据模块定义和实施基准,以便比较我们所获得的不同报价.

We planned to buy new hardware and my boss asked me to define and implement a benchmark based on the modules in order to compare the different offers we have got.

我的想法是简单地按顺序开始每个模块,并选择一组正确的数据作为输入.

My idea is to simply start sequentially each module with a well chosen bunch of data as input.

您有什么建议吗?关于这个简单程序有何评论?

Do you have any advice? Any remarks on this simple procedure?

推荐答案

您的问题范围很广,因此不幸的是,我的回答也不会很具体.

Your question is pretty broad, so unfortunately my answer will not be very specific either.

首先,基准测试很难.不要低估产生有意义,可重复,高度自信的结果所需的努力.

First, benchmarking is hard. Do not underestimate the effort necessary to produce meaningful, repeatable, high-confidence results.

第二,您的绩效目标是什么?它是吞吐量(每秒的事务或操作)吗?是延迟(执行事务所需的时间)吗?您是否关心平均表现?我是否关心最坏情况下的性能?您是否关心绝对的最坏情况,还是我关心90%,95%或其他百分位的设备获得足够的性能?

Second, what is your performance goal? Is it throughput (transaction or operations per second)? Is it latency (time it takes to execute a transaction)? Do you care about average performance? Do I care about worst case performance? Do you care about the absolute worst case or I care that 90%, 95% or some other percentile get adequate performance?

根据您的目标,然后设计基准以实现该目标.因此,如果您对吞吐量感兴趣,则可能希望以指定的速率将消息/事务/输入发送到系统中,并查看系统是否保持正常运行.

Depending on which goal you have, then you should design your benchmark to measure against that goal. So, if you are interested in throughput, you probably want to send messages / transactions / input into your system at a prescribed rate and see if the system is keeping up.

如果您对延迟感兴趣,则可以发送消息/事务/输入并评估处理每个消息所需的时间.

If you are interested in latency, you would send messages / transactions / input and measure how long it takes to process each one.

如果您对最坏情况的性能感兴趣,则将增加系统的负载,直到达到您认为现实"的程度(或系统设计认为应支持的任何程度)为止.

If you are interested in worst case performance you will add load to the system until up to whatever you consider "realistic" (or whatever the system design says it should support.)

第二,您没有说这些模块是否要与CPU绑定,与I/O绑定,是否可以利用多个CPU/内核等.在尝试评估不同的硬件解决方案时,您可能会发现您的应用程序受益于强大的I/O子系统而不是大量的CPU.

Second, you do not say if these modules are going to be CPU bound, I/O bound, if they can take advantage of multiple CPUs/cores, etc. As you are trying to evaluate different hardware solutions you may find that your application benefits more from a great I/O subsystem vs. a huge number of CPUs.

第三,最好的基准(也是最难的)是将现实的负载放入系统中.意思是,您记录了来自生产环境的数据,并通过该数据提供了新的硬件解决方案.完成这项工作比听起来要难,通常,这意味着在系统中添加各种测量点以查看其行为(如果您还没有它们的话),修改现有系统以添加记录/回放功能,回放以不同的速率运行,并获得了一个现实的(即类似于生产环境)进行测试的环境.

Third, the best benchmark (and the hardest) is to put realistic load into the system. Meaning, you record data from a production environment, and put the new hardware solution through this data. Getting this done is harder than it sounds, often, this means adding all kinds of measure points in the system to see how it behaves (if you do not have them already,) modifying the existing system to add record/playback capabilities, modifying the playback to run at different rates, and getting a realistic (i.e., similar to production) environment for testing.

这篇关于对于编写有意义的基准测试,您能给我什么建议?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆