FLOPS真正是一个失败 [英] FLOPS what really is a FLOP

查看:184
本文介绍了FLOPS真正是一个失败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从这个线程传来:<一href=\"http://stackoverflow.com/questions/1536867/flops-intel-core-and-testing-it-with-c-innerproduct\">FLOPS Intel Core和用C测试它(innerproduct)

当我开始编写简单的测试脚本,有几个问题进入了我的脑海。


  1. 为什么浮点?是什么样的浮动我们要考虑点,这样显著?为什么不是一个简单的诠释?


  2. 如果我想衡量FLOPS,让说我做两个向量的内积。必须在两个向量是浮动[]?测量将如何不同,如果我用INT []?


  3. 我不熟悉的英特尔架构。让说我有以下操作:

     浮动= 3.14159;浮动B = 3.14158;
    的for(int i = 0; I&LT; 100; ++ I){
        A + B;
    }

    多少浮点运算这是?


  4. 我有点困惑,因为我研究了32位简体MIPS架构。对于每一个指令,有32位,像5位操作数1和5位操作数2等,所以对Intel平台(专门从previous线程相同的架构),我被告知,该寄存器可容纳128位。对于单preCISION浮点运算,每个浮点32位数字,这是否意味着送入处理器每个指令,它可以采取4
    浮点数字?难道我们还必须考虑参与操作数和指令的其他部位位?我们如何能够只给4浮点数到CPU没有任何具体的意义呢?


我不知道我的点点滴滴以为一切的接近是否有意义。如果不是,透视什么的身高我应该看什么?


解决方案

  1. 浮点和整数运算使用不同的管道在芯片上,所以他们以不同的速度运行(简单/旧足够的架构有可能是没有原生浮点支持可言,使得浮点运算的的慢)。所以,如果你试图估计真实世界的性能使用浮点运算问题,你需要知道这些操作的速度有多快。


  2. 是的,你必须使用浮点数据。见#1。


  3. 一个FLOP通常被定义为一个平均值,旨在重新要模拟真实世界的问题presentative操作的特定混合物。为了您的循环,你只指望每次加至1操作合计100操作。 BUT :这是不是重新最真实世界的工作presentative 的你可能不得不采取措施,prevent从优化所有的工作了编译器。


  4. 矢量化或SIMD(单指令多数据)可以做到这一点。在使用SIMD系统的例子,现在包括AltiVec技术(在PowerPC系列芯片)和MMX / SSE / ...在Intel x86和兼容。在芯片这样的改进应该得到的信用做更多的工作,让你的琐碎环以上仍然算作100的操作,即使有25只取和工作周期。编译器要么需要很聪明,或从程序员收到提示要利用SIMD单元(但最前线的编译器是很聪明的这些天)。


I came from this thread: FLOPS Intel core and testing it with C (innerproduct)

As I began writing simple test scripts, a few questions came into my mind.

  1. Why floating point? What is so significant about floating point that we have to consider? Why not a simple int?

  2. If I want to measure FLOPS, let say I am doing the inner product of two vectors. Must the two vectors be float[] ? How will the measurement be different if I use int[]?

  3. I am not familiar with Intel architectures. Let say I have the following operations:

    float a = 3.14159; float b = 3.14158;
    for(int i = 0; i < 100; ++i) {
        a + b;
    }
    

    How many "floating point operations" is this?

  4. I am a bit confused because I studied a simplified 32bit MIPS architecture. For every instruction, there is 32 bits, like 5 bit for operand 1 and 5 bit for operand 2 etc. so for intel architectures (specifically the same architecture from the previous thread), I was told that the register can hold 128 bit. For SINGLE PRECISION floating point, 32bit per float point number, does that mean for each instruction fed to the processor, it can take 4 floating point numbers? Don't we also have to account for bits involved in operands and other parts of the instruction? How can we just feed 4 floating point numbers to a cpu without any specific meaning to this?

I don't know whether my approach of thinking everything in bits and pieces make sense. If not, what "height" of perspective should I be looking at?

解决方案

  1. Floating point and integer operation use different pipelines on the chip, so they run at different speeds (on simple/old enough architectures there may be no native floating point support at all, making floating point operation very slow). So if you are trying to estimate real world performance for problems that use floating point math, you need to know how fast these operation are.

  2. Yes, you must use floating point data. See #1.

  3. A FLOP is typically defined as an average over a particular mixture of operations that is intended to be representative of the real world problem you want to model. For your loop, you would just count each addition as 1 operation making a total of 100 operations. BUT: this is not representative of most real world jobs and you may have to take steps to prevent the compiler from optimizing all the work out.

  4. Vectorized or SIMD (Single Instruction Multiple Data) can do exactly that. Example of SIMD systems in use right now include AltiVec (on PowerPC series chips) and MMX/SSE/... on Intel x86 and compatible. Such improvements in chips should get credit for doing more work, so your trivial loop above would still be counted as 100 operation even if there are only 25 fetch and work cycles. Compilers either need to be very smart, or receive hints from the programmer to make use of SIMD units (but most front-line compilers are very smart these days).

这篇关于FLOPS真正是一个失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆