什么是FLOP/s,它是衡量性能的好方法吗? [英] What is FLOP/s and is it a good measure of performance?

查看:286
本文介绍了什么是FLOP/s,它是衡量性能的好方法吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我被要求测量一个fortran程序的性能,该程序可以解决多CPU系统上的微分方程.我的雇主坚持要求我测量FLOP/s(每秒的浮动操作数),并将结果与​​基准进行比较( LINPACK ),但我不相信这是要走的路,只是因为没人能向我解释什么是FLOP.

我对FLOP的确切含义进行了一些研究,得出了一些矛盾的答案.我得到的最受欢迎的答案之一是"1 FLOP =加法和乘法运算".真的吗?如果是这样,那么从物理上又是什么意思?

无论我最终使用什么方法,它都必须是可扩展的.某些版本的代码可以解决具有数百万个未知数的系统,并且需要花费几天的时间来执行.

在我的案例中,还有什么其他有效的方式来衡量性能(我的案例摘要是"fortran代码,该代码在数百个CPU上连续数天进行大量算术计算)?

解决方案

只要您确切了解性能指标,它就是相当不错的性能指标.

FLOPS顾名思义是每秒的浮点运算数,实际上,组成FLOP的内容可能会因CPU而异. (例如,某些CPU可以将加法和乘法作为一项操作执行,而其他CPU则不能).这意味着,作为一种性能指标,它与硬件相当接近,这意味着1)您必须了解硬件才能在给定的体系结构上计算理想的FLOPS,并且必须了解算法和实现才能弄清楚如何它实际上由许多浮点运算组成.

无论如何,它是检查CPU使用情况的有用工具.如果您知道CPU在FLOPS中的理论峰值性能,则可以计算出CPU浮点单位的使用效率,而浮点单位通常是很难有效利用的浮点单位之一.运行CPU所支持的FLOPS的30%的程序具有优化的空间.除非您更改基本算法,否则以70%的速度运行的方法可能不会获得更高的效率.对于像您这样的数学运算繁重的算法,这几乎是衡量性能的标准方法.您可以简单地测量一个程序运行所需的时间,但是这取决于CPU.但是,如果您的程序具有50%的CPU使用率(相对于FLOPS峰值),则该值会更恒定(在完全不同的CPU架构之间仍会有所不同,但比执行时间要一致得多)./p>

但是,知道我的CPU具有X GFLOPS的能力,而实际上我只能实现20%的吞吐量",这在高性能软件中是非常有价值的信息.这意味着除浮点操作之外的其他 阻碍了您的工作,从而妨碍了FP单元的有效工作.而且由于FP单元占了大部分工作,这意味着您的软件有问题.

测量我的程序在X分钟内运行"很容易,如果您认为这不可接受,那么可以确定我想知道我是否可以砍掉30%的价格",但是您不会知道是否可行,除非您确切地计算出要完成的工作量以及CPU的峰值能力.如果您甚至不知道CPU从根本上是否能够每秒运行更多指令,您想花多少时间优化它?

通过FP运算符之间的过多依赖关系,或通过过多的分支或类似内容阻止有效的调度,很容易阻止CPU FP单元的有效利用.如果那是阻碍实现的原因,那么您需要知道这一点.您需要知道我没有获得应有的FP吞吐量,因此很明显,我的代码的其他部分正在阻止FP指令在CPU准备发出指令时可用".

您为什么需要其他方法来衡量绩效?您老板要求您算出FLOPS计数有什么问题? ;)

I've been asked to measure the performance of a fortran program that solves differential equations on a multi-CPU system. My employer insists that I measure FLOP/s (Floating operations per second) and compare the results with benchmarks (LINPACK) but I am not convinced that it's the way to go, simply because no one can explain to me what a FLOP is.

I did some research on what exactly a FLOP is and I got some pretty contradicting answers. One of the most popular answers I got was '1 FLOP = An addition and a multiplication operation'. Is that true? If so, again, physically, what exactly does that mean?

Whatever method I end up using, it has to be scalable. Some of versions of the code solve systems with multi-million unknowns and takes days to execute.

What would be some other, effective, ways of measuring performance in my case (summary of my case being 'fortran code that does a whole lot of arithmetic calculations over and over again for days on several hundred CPUs)?

解决方案

It's a pretty decent measure of performance, as long as you understand exactly what it measures.

FLOPS is, as the name implies FLoating point OPerations per Second, exactly what constitutes a FLOP might vary by CPU. (Some CPU's can perform addition and multiplication as one operation, others can't, for example). That means that as a performance measure, it is fairly close to the hardware, which means that 1) you have to know your hardware to compute the ideal FLOPS on the given architecture, and you have to know your algorithm and implementation to figure out how many floating point ops it actually consists of.

In any case, it's a useful tool for examining how well you utilize the CPU. If you know the CPU's theoretical peak performance in FLOPS, you can work out how efficiently you use the CPU's floating point units, which are often one of the hard to utilize efficiently. A program which runs 30% of the FLOPS the CPU is capable of, has room for optimization. One which runs at 70% is probably not going to get much more efficient unless you change the basic algorithm. For math-heavy algorithms like yours, that is pretty much the standard way to measure performance. You could simply measure how long a program takes to run, but that varies wildly depending on CPU. But if your program has a 50% CPU utilization (relative to the peak FLOPS count), that is a somewhat more constant value (it'll still vary between radically different CPU architectures, but it's a lot more consistent than execution time).

But knowing that "My CPU is capable of X GFLOPS, and I'm only actually achieving a throughput of, say, 20% of that" is very valuable information in high-performance software. It means that something other than the floating point ops is holding you back, and preventing the FP units from working efficiently. And since the FP units constitute the bulk of the work, that means your software has a problem.

It's easy to measure "My program runs in X minutes", and if you feel that is unacceptable then sure, you can go "I wonder if I can chop 30% off that", but you don't know if that is possible unless you work out exactly how much work is being done, and exactly what the CPU is capable of at peak. How much time do you want to spend optimizing this, if you don't even know whether the CPU is fundamentally capable of running any more instructions per second?

It's very easy to prevent the CPU's FP unit from being utilized efficiently, by having too many dependencies between FP ops, or by having too many branches or similar preventing efficient scheduling. And if that is what is holding your implementation back, you need to know that. You need to know that "I'm not getting the FP throughput that should be possible, so clearly other parts of my code are preventing FP instructions from being available when the CPU is ready to issue one".

Why do you need other ways to measure performance? What's wrong with just working out the FLOPS count as your boss asked you to? ;)

这篇关于什么是FLOP/s,它是衡量性能的好方法吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆