新的AVX指令语法 [英] New AVX-instructions syntax

查看:247
本文介绍了新的AVX指令语法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我曾与一些英特尔intrinsincs编写过C code。我SSSE3标志先用AVX,然后进行编译后,我得到了两个完全不同的装配codeS。例如:

AVX:

  vpunpckhbw%XMM0,xmm1中的%,%XMM2

SSSE3:

  MOVDQA%XMM0,%XMM2
punpckhbw%将xmm1,%XMM2

很明显, vpunpckhbw 就是 punpckhbw 但使用AVX 3操作数的语法。但是延迟和第一指令相当于延迟和最后的合并的吞吐​​量的吞吐量?
还是答案取决于我使用的架构?它的方式是酷睿i5-6500。

我试图寻找在瓦格纳雾的指示表的答案,但找不到答案。英特尔规范还没有帮助(但很可能我只是错过我需要的那个)。

时,它总是更好,如果可以使用新的AVX语法?


解决方案

  

时它总是更好地使用新的AVX语法,如果可能的?


我认为首要的问题是要问,如果文件夹中的指令比非文件夹指令对更好。折叠需要对读取和修改说明书这样

  vmovdqa%XMM0,%XMM2
vpunpckhbw%XMM2,%将xmm1,xmm1中的%

和折叠起来成为一个组合指令

  vpunpckhbw%XMM0,xmm1中的%,%XMM2

由于Ivy Bridge的注册移动指令寄存器可以拥有零延迟,并可以使用零执行端口。然而,未折叠的指令对仍算作在前端两条指令,因此可以影响整体的吞吐量。然而折叠指令仅计为在前端从而降低在前端的pressure无任何副作用一个指令。这可以提高整体吞吐量。

不过,对于内存注册移动折叠可能有副作用(目前还<一个href=\"https://stackoverflow.com/questions/21134279/difference-in-performance-between-msvc-and-gcc-for-highly-optimized-matrix-multp#comment63917947_21151780\">some辩论有关此),即使它降低在前端pressure。其原因是,从图的前端点的乱序引擎只能看到一个折叠的指令(假设该答案是正确的),并且如果由于某种原因,将在折叠指令是更理想的重新排序的存储器读操作(因为它确实需要执行港口以及具有延迟)独立于其它操作外的顺序发动机赢得' T为能够利用这一优势。我观察这个首次这里

有关特定操作AVX语法始终是更好,因为它折叠登记招寄存器。但是,如果你有注册移动的文件夹AVX指令可能会比在某些情况下展开的SSE指令对表现差的存储器。


需要注意的是,在一般情况下,仍然应该更好地使用VEX-CN codeD的说明。但我想大多数编译器,如果不是全部,现在假设折叠总是更好,所以你没有办法控制的折叠除外组件(甚至没有与内部函数),或在某些情况下,告诉编译器不与AVX编译。

I had a C code written with some intel-intrinsincs. After I compiled it first with avx and then with ssse3 flags, I got two quite different assembly codes. E.g:

AVX:

vpunpckhbw  %xmm0, %xmm1, %xmm2 

SSSE3:

movdqa %xmm0, %xmm2
punpckhbw %xmm1, %xmm2

It's clear that vpunpckhbw is just punpckhbw but using the avx three operand syntax. But is the latency and the throughput of the first instruction equivalent to the latency and the throughput of the last ones combined? Or does the answer depend on the architecture I'm using? It's IntelCore i5-6500 by the way.

I tried to search for an answer in Agner Fog's instruction tables but couldn't find the answer. Intel specifications also didn't help (however, it's likely that I just missed the one I needed).

Is it always better to use new AVX syntax if possible?

解决方案

Is it always better to use new AVX syntax if possible?

I think the first question is to ask if folder instructions are better than a non-folder instruction pair. Folding takes a pair of read and modify instructions like this

vmovdqa %xmm0, %xmm2
vpunpckhbw %xmm2, %xmm1, %xmm1

and "folds" them into one combined instruction

vpunpckhbw  %xmm0, %xmm1, %xmm2

Since Ivy Bridge a register to register move instruction can have zero latency and can use zero execution ports. However, the unfolded instruction pair still counts as two instructions on the front-end and therefore can affect the overall throughput. The folded instruction however only counts as one instruction in the front-end which lowers the pressure on the front-end without any side effects. This could increase the overall throughput.

However, for memory to register moves the folding can may have a side effect (there is currently some debate about this) even if it lowers pressure on the front-end. The reason is that the out-of-order engine from the front-ends point of view only sees a folded instruction (assuming this answer is correct) and if for some reason it would be more optimal to reorder the memory read operation (since it does require execution ports and has latency) independently from the other operations in the folded instruction the out-of-order engine won't be able to take advantage of this. I observed this for the first time here.

For your particular operation the AVX syntax is always better since it folds the register to register move. However, if you had a memory to register move the folder AVX instruction could perform worse than the unfolded SSE instruction pair in some cases.


Note that, in general, it should still be better to use a vex-encoded instructions. But I think most compilers, if not all, now assume folding is always better so you have no way to control the folding except with assembly (not even with intrinsics) or in some cases by telling the compiler not to compile with AVX.

这篇关于新的AVX指令语法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆