混合 SSE 整数/浮点 SIMD 指令时是否会降低性能 [英] Do I get a performance penalty when mixing SSE integer/float SIMD instructions

查看:46
本文介绍了混合 SSE 整数/浮点 SIMD 指令时是否会降低性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近经常以内在函数的形式使用 x86 SIMD 指令 (SSE1234).我发现令人沮丧的是,SSE ISA 有几个简单的指令,这些指令仅适用于浮点数或仅适用于整数,但理论上应该对两者执行相同.例如,float 和 double 向量都有从地址(movhpsmovhpd)加载 128 位向量的更高 64 位的指令,但没有这样的整数指令向量.

I've used x86 SIMD instructions (SSE1234) in the form of intrinsics quite a lot lately. What I found frustrating is that the SSE ISA has several simple instructions that are available only for floats or only for integers, but in theory should perform equally for both. For example, both float and double vectors have instructions to load higher 64bits of a 128-bit vector from an address (movhps, movhpd), but there's no such instruction for integer vectors.

我的问题:

在整数向量上使用浮点指令时,是否有任何理由预期性能会受到影响,例如使用 movhps 将数据加载到整数向量?

Is there any reasons to expect a performance hit when using floating point instructions on integer vectors, e.g. using movhps to load data to an integer vector?

我写了几个测试来检查,但我认为他们的结果不可信.编写一个正确的测试来探索这些事情的所有极端情况真的很难,尤其是当指令调度很可能在这里涉及时.

I wrote several tests to check that, but I suppose their results are not credible. It's really hard to write a correct test that explores all corner cases for such things, especially when the instruction scheduling is most probably involved here.

相关问题:

其他琐碎类似的东西也有几个基本相同的指令.例如,我可以对 pororpsorpd 进行按位或运算.谁能解释这些附加说明的目的是什么?我想这可能与应用于每条指令的不同调度算法有关.

Other trivially similar things also have several instructions that do basically the same. For example I can do bitwise OR with por, orps or orpd. Can anyone explain what's the purpose of these additional instructions? I guess this might be related to different scheduling algorithms applied to each instruction.

推荐答案

来自专家(显然不是我 :P):http://www.agner.org/optimize/optimizing_assembly.pdf [13.2 将向量指令用于其他类型的数据而不是它们的预期(第 118-119 页)]:

From an expert (obviously not me :P): http://www.agner.org/optimize/optimizing_assembly.pdf [13.2 Using vector instructions with other types of data than they are intended for (pages 118-119)]:

在某些处理器上使用错误类型的指令会受到惩罚.这是因为处理器可能有不同的数据总线或不同的整数执行单元和浮点数据.在整数和浮点单元之间移动数据可能需要一个或多个时钟周期取决于处理器,如表 13.2 中所列.

There is a penalty for using the wrong type of instructions on some processors. This is because the processor may have different data buses or different execution units for integer and floating point data. Moving data between the integer and floating point units can take one or more clock cycles depending on the processor, as listed in table 13.2.

Processor                       Bypass delay, clock cycles 
  Intel Core 2 and earlier        1 
  Intel Nehalem                   2 
  Intel Sandy Bridge and later    0-1 
  Intel Atom                      0 
  AMD                             2 
  VIA Nano                        2-3 
Table 13.2. Data bypass delays between integer and floating point execution units 

这篇关于混合 SSE 整数/浮点 SIMD 指令时是否会降低性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆