FLOPS英特尔核心,以C(innerproduct)测试它 [英] FLOPS Intel core and testing it with C (innerproduct)

查看:1017
本文介绍了FLOPS英特尔核心,以C(innerproduct)测试它的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述


  1. 我有一些误解关于测量拖,基于英特尔架构,是一个失败一次加法和一次乘法在一起吗?我读到这个地方网上并没有争论,可以拒绝这个。我知道,FLOP对不同类型的CPU不同的含义。

  1. I have some misconceptions about measuring flops, on Intel architecture, is a FLOP one addition and one multiplication together? I read about this somewhere online and there is no debate that could reject this. I know that FLOP has a different meaning on different types of cpu.

如何计算我的理论峰值FLOPS?我使用英特尔(R)酷睿(TM)2双核E7400 CPU @ 2.80GHz的。到底是什么GHz和FLOPS之间的关系? (连维基百科上关于FLOPS条目不指定如何做到这一点)

How do I calculate my theoretical peak FLOPS? I am using Intel(R) Core(TM)2 Duo CPU E7400 @ 2.80GHz. What exactly is the relationship between GHz and FLOPS? (even wikipedia's entry on FLOPS does NOT specify how to do this)

我会用下面的方法来衡量我的电脑的实际性能(在触发器而言):内积两个向量的:为大小为N两个向​​量,是触发器2N(N数 - 1)(如果加入或一次乘法被认为是1触发器)。如果不是这样,我应该怎么去计算呢?

I will be using the following methods to measure the actual performance of my computer (in terms of flops): Inner product of two vectors: for two vectors of size N, is the number of flops 2n(n -1) (if one addition or one multiplication is considered to be 1 flop). If not, how should I go about calculating this?

我知道有更好的方法来做到这一点,但我想知道我所提出的计算是否正确。我在其他地方LINPACK为基准,但我还是想知道它是如何做。

I know there better ways to do so, but I would like to know whether my proposed calculations are right. I read somewhere about LINPACK as a benchmark, but I would still like to know how it's done.

推荐答案

至于你的第二个问题,理论计算FLOPS是不是太辛苦了。它可以被分解为大致是:

As for your 2nd question, the theoretical FLOPS calculation isn't too hard. It can be broken down into roughly:

(核数)*(次/秒)*(执行单元操作/周期)*(执行单元/芯数)(花车每注册/执行单元操作)

(Number of cores) * (Number of execution units / core) * (cycles / second) * (Execution unit operations / cycle) * (floats-per-register / Execution unit operation)

一个睿2双核有2个核心,每核心1执行单元。一个SSE寄存器有128位。一个浮动为32位宽,因此您可以存储每个寄存器4浮动。我假定执行单元确实每个周期1 SSE操作。因此它应该是:

A Core-2 Duo has 2 cores, and 1 execution unit per core. an SSE register is 128 bits wide. a float is 32 bits wide so you can store 4 floats per register. I assume the execution unit does 1 SSE operation per cycle. So it should be:

2 * 1 * 2.8 * 1 * 4 = 22.4 GFLOPS

2 * 1 * 2.8 * 1 * 4 = 22.4 GFLOPS

它匹配:
<一href=\"http://www.intel.com/support/processors/sb/cs-023143.htm\">http://www.intel.com/support/processors/sb/cs-023143.htm

这数显然是纯理论的最好情况下的性能。现实世界中的表现将很可能不会来接近这个由于各种各样的原因。它可能不是值得尝试直接关联触发器实际的应用程序运行时,你会更好尝试通过您的一个应用所使用的计算。

This number is obviously purely theoretical best case performance. Real world performance will most likely not come close to this due to a variety of reasons. It's probably not worth trying to directly correlate flops to actual app runtime, you'd be better off trying out the computations used by your applicaton.

这篇关于FLOPS英特尔核心,以C(innerproduct)测试它的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆