AVX512舍入模式如何工作(或者NDISASM只是感到困惑)? [英] How do AVX512 rounding modes work (or is NDISASM simply confused)?

查看:209
本文介绍了AVX512舍入模式如何工作(或者NDISASM只是感到困惑)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试了解特定的AVX512F指令 vcvtps2udq .

I’m trying to understand the specific AVX512F instruction vcvtps2udq.

指令的签名为VCVTPS2UDQ zmm1 {k1}{z}, zmm2/m512/m32bcst{er}.手册信息如下.

The signature of the instruction is VCVTPS2UDQ zmm1 {k1}{z}, zmm2/m512/m32bcst{er}. The manual info is below.

为了理解新的舍入模式,以下代码段与NASM(2.12.02)组合在一起

In an attempt to understand the new rounding modes, the following code snippet is assembled with NASM (2.12.02)

vcvtps2udq zmm0,zmm1
vcvtps2udq zmm0,zmm1,{rz-sae}
vcvtps2udq xmm0,xmm1

使用NDISASM(2.12.02)分解结果会引起很多混乱和以下代码:

Deassembling the results with NDISASM (2.12.02) gives a lot of confusion and the following codes:

62F17C4879C1      vcvtps2udq zmm0,zmm1
62F17C7879C1      vcvtps2udq xmm0,xmm1
62F17C0879C1      vcvtps2udq xmm0,xmm1

问题:第二行使用xmm寄存器而不是zmm寄存器进行了反汇编(这是我所期望的).与零舍入模式(rz-sae)有关.还是只是NDISASM错误而无法区分操作码62F17C7879C1和62F17C0879C1.

Question: the second line is deassembled with xmm registers instead of a zmm register (that I would have expected). Has the zero rounding mode (rz-sae) something to do with it. Or is just NDISASM wrong and cannot distinguish between opcodes 62F17C7879C1 and 62F17C0879C1.

英特尔指令集参考手册具有以下描述:

The Intel instruction set reference manual has the following description:

转换十六进制压缩的单精度浮点值 源操作数为16个无符号双字整数 目标操作数.

Converts sixteen packed single-precision floating-point values in the source operand to sixteen unsigned doubleword integers in the destination operand.

当转换不精确时,返回的值将根据四舍五入 到MXCSR寄存器或嵌入式中的舍入控制位 舍入控制位.如果转换结果不能表示为 目标格式,浮点无效异常为 引发,并且如果屏蔽了此异常,则整数值2w – 1为 返回,其中w表示目标中的位数 格式.

When a conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR register or the embedded rounding control bits. If a converted result cannot be represented in the destination format, the floating-point invalid exception is raised, and if this exception is masked, the integer value 2w – 1 is returned, where w represents the number of bits in the destination format.

源操作数是ZMM/YMM/XMM寄存器,512/256/128位存储器 位置,或从32位存储器广播的512/256/128位向量 地点.目标操作数是ZMM/YMM/XMM寄存器 有条件地使用writemask k1更新了.

The source operand is a ZMM/YMM/XMM register, a 512/256/128-bit memory location, or a 512/256/128-bit vector broadcasted from a 32-bit memory location. The destination operand is a ZMM/YMM/XMM register conditionally updated with writemask k1.

推荐答案

操作码被编码为0x62 P0 P1 P2 ...

The opcodes are encoded as 0x62 P0 P1 P2 ... see here section 4.2. In this case, the P2 bytes are

P2
48  <- vcvtps2udq zmm0,zmm1
78  <- vcvtps2udq zmm0,zmm1,{rz-sae}
08  <- vcvtps2udq xmm0,xmm1

进一步细分以下字段

                       zmm  zmm+sae  xmm
EVEX.aaa  = P2[2:0]     0     0       0
EVEXV'    = P2[3]       1     1       1
EVEX.b    = P2[4]       0     1       0  "Broadcast/RC/SAE Context"
EVEX.L'L  = P2[6:5]     2     3       0  "Vector length/RC"
EVEX.z    = P2[7]       0     0       0

因此,不同的字段是EVEX.b和EVEX.L'L.根据文档,如果未设置b,则L'L是SIMD长度,因此0 = xmm2 = zmm.如果设置了b,则L'L将重新解释为静态舍入模式,并且假定长度为zmm(512位).

So the different fields are EVEX.b and EVEX.L'L. According to the docs, if b is not set, then L'L is the SIMD length, so 0 = xmm and 2 = zmm. If b is set, the L'L is reinterpreted as the static rounding mode and the length is assumed to be zmm (512 bits).

NDISASM不能正确解释EVEX.B位,因此也不能正确解释EVEX.L'L字段.

NDISASM is not interpreting the EVEX.B bit correctly, and thus the EVEX.L'L field either.

这篇关于AVX512舍入模式如何工作(或者NDISASM只是感到困惑)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆