FLOP测量 [英] FLOP measurement
问题描述
我正在尝试使用intel vtune Amplifier估计我的应用程序的FLOPS,并且我在此处使用此帖子作为准则:解决方案
可用事件集可以在处理器世代之间改变.准确知道您的处理器名称很重要.您提到的事件存在于Intel Xeon v2(基于Ivybridge)上,您可以使用以下公式来衡量浮点运算的数量:FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + 4 * FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + 8 * SIMD_FP_256.PACKED_SINGLE + FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE + 2 * FP_COMP_OPS_EXED_SIMD_4
对于基于Haswell的处理器(Xeon v3),没有此类事件,并且无法进行FLOP计算.
对于基于Broadwell的公式,将如下所示:FP_ARITH_INST_RETIRED.SCALAR_SINGLE + 4 * FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + 8 * FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE + INST_RETIRED.X87
I'm trying to estimate FLOPS for my application using intel vtune Amplifier and I'm using this post here as a guideline : https://software.intel.com/en-us/articles/estimating-flops-using-event-based-sampling-ebs/
The problem is that I can't find the FP_COMP_OPS_EXE event in vtune gui. When I run amplxe-cl with this event config, I get the following error:
amplxe: Error: Invalid Event FP_COMP_OPS_EXE.X87 discarded.
I'm working on CentOS and my processor is intel Xeon
Any help would be appreciated
The available events set can change between processors generations. It is important to know exactly your processor name. The event you mentioned exist for Intel Xeon v2 (Ivybridge based) and you can use following formula to measure the number of floating points operations: FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + 4 * FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + 8 * SIMD_FP_256.PACKED_SINGLE + FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE + 2 * FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * SIMD_FP_256.PACKED_DOUBLE + FP_COMP_OPS_EXE.X87
For Haswell based processors (Xeon v3) there are no such events and FLOPs calculation is not possible there.
For Broadwell based the formula will be following: FP_ARITH_INST_RETIRED.SCALAR_SINGLE + 4 * FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + 8 * FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE + INST_RETIRED.X87
这篇关于FLOP测量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!