如何将AVX512寄存器zmm26中的QuadWord写入rax寄存器? [英] How can I write a QuadWord from AVX512 register zmm26 to the rax register?

查看:134
本文介绍了如何将AVX512寄存器zmm26中的QuadWord写入rax寄存器?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望对zmm 0-31寄存器集的四字单元执行整数算术运算,并保留由这些运算产生的进位位.看来只有在通用寄存器集中处理了数据,这才有可能.

I wish to perform integer arithmetic operations on Quad Word elements of the zmm 0-31 register set and preserve the carry bit resulting from those operations. It appears this is only possible if the data were worked on in the general register set.

因此,我想将信息从zmm 0-31寄存器之一复制到通用寄存器之一.处理完通用寄存器中的64位数据后,我想将数据返回到原始zmm 0-31寄存器,该寄存器位于与之相同的QuadWord位置.我知道我可以使用命令

Thus I would like to copy information from one of the zmm 0-31 registers to one of the general purpose registers. After working on the 64 bit data in the general purpose register, I would like to return the data to the original zmm 0-31 register in the same QuadWord location it came from. I know that I can move the data from the general purpose register rax to the AVX512 register zmm26 QuadWord location 5 using the command

    vpbroadcastq zmm26{k5}{z},rax 

其中8位掩码k5 =十进制32,允许将数据广播到zmm26的第五个QuadWord,z = 1表示zmm26中的其他QWord均不受影响,而rax是数据的来源.

where 8 bit mask k5 = decimal 32, allows broadcasting of the data to the 5th QuadWord of zmm26 and z=1 indicating that no other QWord in zmm26 is affected, and rax is where the data originates from.

但是我找不到一个反向命令,该命令会将数据从寄存器zmm26(四字5)写入rax寄存器.看来我只能使用vmovq rax,xmm1命令将最低有效的QuadWord从AVX寄存器复制到通用寄存器.而且没有使用隐藏的zmm 0-31信号源的广播命令.

But I cannot find an inverse command that will write the data from register zmm26, Quad word 5 to the rax register. It appears that I can only copy the least significant QuadWord from an AVX register to a general purpose register using the vmovq rax, xmm1 command. And there is no broadcast command using a masked zmm 0-31 source.

我会很高兴知道我的命令选项是从zmm 0-31寄存器到rax寄存器获取特定的QuadWord.此外,此时,除了intel手册以外,AVX512指令集上还有其他描述性信息源吗?

I would appreciate knowing what my command options would be to get a particular QuadWord from an zmm 0-31 register to the rax register would be. Also, are there any other descriptive sources of information on the AVX512 instruction set other than the intel manual at this point?

推荐答案

与某些早期的SIMD扩展程序不同,这些扩展程序具有直接执行此操作的"c0>"之类的提取"指令,我不知道有什么方法可以在除以下以外的AVX-512(在带有ymm寄存器的AVX中也不能这样做):

Unlike some of the earlier SIMD extensions which had the "extract" instructions such as pextrq which would do this directly, I'm not aware of any way to do it in AVX-512 (nor in AVX with ymm registers) other than:

  1. 将所需的元素置换/改组为低阶四字,然后如您所述使用vmovq将其放入通用寄存器中.

  1. Permuting/shuffling the element you want into the lower order quadword and then using vmovq as you noted to get it into a general purpose register.

将整个向量存储到临时存储位置loc(例如堆栈),然后使用mov register,[loc + offset]指令读取您感兴趣的qword.

Storing the entire vector to a temporary memory location loc, such as the stack, then using mov register,[loc + offset] instructions to read whichever qword(s) you are interested in.

这两种方法看起来都很难看,哪种方法更好取决于您的实际情况.尽管使用内存作为中介,但是如果您打算从每个向量中提取多个值,则第二种方法可能会更快,因为您可以利用吞吐量为一个负载/周期的最新CPU上的两个负载端口,而使用置换/混洗方法可能会在置换/混洗所需的端口上出现瓶颈.

Both approaches seem pretty ugly, and which is better depends on your exact scenario. Despite using memory as an intermediary, the second approach may be faster if you plan to extract several values from each vector since you can make use of both load ports on recent CPUs which have throughput of one load/cycle, while the permute/shuffle approach is likely to bottleneck a on the port required for the permute/shuffle.

有关更全面的治疗方法,请参见下面的彼得的回答,包括使用带面罩的vcompress指示作为穷人的摘录.

See Peter's answer below for a more comprehensive treatment, including using the vcompress instructions with a mask as a kind of poor-man's extract.

这篇关于如何将AVX512寄存器zmm26中的QuadWord写入rax寄存器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆