为什么FMA _mm256_fmadd_pd()内部函数具有3个asm助记符,即"vfmadd132pd","231"和“和"213"? [英] Why does the FMA _mm256_fmadd_pd() intrinsic have 3 asm mnemonics, "vfmadd132pd", "231" and "213"?

查看:838
本文介绍了为什么FMA _mm256_fmadd_pd()内部函数具有3个asm助记符,即"vfmadd132pd","231"和“和"213"?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人可以向我解释为什么融合的乘加指令有3种变体:vfmadd132pdvfmadd231pdvfmadd213pd,而C内在函数_mm256_fmadd_pd吗?

Could someone explain to me why there are 3 variants of the fused multiply-accumulate instruction: vfmadd132pd, vfmadd231pd and vfmadd213pd, while there is only one C intrinsics _mm256_fmadd_pd?

为简单起见,(在AT& T语法中)有什么区别

To make things simple, what is the difference between (in AT&T syntax)

vfmadd132pd  %ymm0, %ymm1, %ymm2
vfmadd231pd  %ymm0, %ymm1, %ymm2
vfmadd213pd  %ymm0, %ymm1, %ymm2

我没有从英特尔的内在函数指南了解任何信息.我之所以问是因为我在编写的C代码的汇编输出中看到了所有这些代码.谢谢.

I did not get any idea from Intel's intrinsics guide. I ask because I see all of them in the assembler output of a chunk of C code I wrote. Thanks.

干净的答案(在下面重新设置答案的格式)

对于变体ijkvfmaddijkpd的含义:

  • 英特尔语法:op(i) * op(j) + op(k) -> op(1)
  • AT& T语法:op(4-i) * op(4-j) + op(4-k) -> op(3)
  • intel syntax: op(i) * op(j) + op(k) -> op(1)
  • AT&T syntax: op(4-i) * op(4-j) + op(4-k) -> op(3)

其中,op(n)表示指令之后的第n个操作数.因此,两者之间有一个反向转换:

where op(n) denotes the n-th operand after the instruction. So there is a reverse transform between the two:

n <- 4 - n

推荐答案

融合的乘加指令将两个(打包的)值相乘,添加第三个值,然后用结果覆盖其中一个值.这三个值中只有一个可以是内存操作数,而不是寄存器.

The fused multiply-add instructions multiply two (packed) values, add a third value, and then overwrite one of the values with the result. Only one of the three values can be a memory operand rather than a register.

它的工作方式是所有三个指令都覆盖ymm0,并且仅允许ymm2作为内存操作数.指令的选择确定将两个操作数相乘并相加.

The way it works is that all three instructions overwrite ymm0 and allow only ymm2 to be a memory operand. The choice of instruction determines which two operands are multiplied and which is added.

假设ymm0是Intel语法中的第一个操作数(或AT& T语法中的最后一个):

Assuming that ymm0 is the first operand in Intel syntax (or the last in AT&T syntax):

vfmadd132pd:  ymm0 = ymm0 * ymm2/mem + ymm1
vfmadd231pd:  ymm0 = ymm1 * ymm2/mem + ymm0
vfmadd213pd:  ymm0 = ymm1 * ymm0 + ymm2/mem 

使用C内部函数时,此选择不是必需的:内部函数不会覆盖值,而是返回其结果,并且它允许从内存中读取所有三个值.编译器将在需要时添加对存储器的读/写操作,如果不希望覆盖这三个值中的任何一个,则会分配一个临时寄存器来存储结果.它将选择合适的三个说明之一.

When using the C intrinsics, this choice isn't necessary: The intrinsic does not overwrite a value but returns its result instead, and it allows all three values to be read from memory. The compiler will add memory reads/writes if needed, and will allocate a temporary register to store the result if it does not want any of the three values to be overwritten. It will choose one of the three instructions as it sees fit.

这篇关于为什么FMA _mm256_fmadd_pd()内部函数具有3个asm助记符,即"vfmadd132pd","231"和“和"213"?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆