内联汇编代码以读取/写入XMM& YMM寄存器? [英] inline assembly code to read/write XMM & YMM registers?

查看:258
本文介绍了内联汇编代码以读取/写入XMM& YMM寄存器?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有2个变量来模拟X86 XMM& YMM,如下所示:

I have 2 variables to emulate X86 XMM & YMM, like below:

uint64_t xmm_value[2];
uint64_t ymm_value[4];

现在,我想使用内联汇编来阅读&向XMM/YMM寄存器写入数据.

Now I want to use inline assembly to read & write to/from XMM/YMM registers.

  • 如何编写GCC内联汇编以将xmm_value复制到寄存器XMM0?
  • 如何编写GCC内联汇编以将寄存器YMM0复制到ymm_value?
  • How to write GCC inline assembly to copy xmm_value to register XMM0?
  • How to write GCC inline assembly to copy register YMM0 to ymm_value?

我已经尝试搜索执行此操作的示例内联汇编,但是找不到任何好的答案.谢谢!

I already tried to search for sample inline assembly doing this, but could not find any good answer. Thanks!

因此,在一些帮助下,我编写了这段代码,并编译成功.我将movups用于XMM,将vmovups用于YMM,如下所示.这是正确的,并且我仍然可以优化我的代码吗?

So with some helps, I wrote this code, and it compiled OK. I use movups for XMM, and vmovups for YMM, like below. Is this correct, and can I still optimize my code?

__m128 xmm0;
__m256 ymm0;

// write to XMM0, and read from YMM0
__asm__("movups %1, %%xmm0\n\t"
        "vmovups %%ymm0, %0"
        : "=m"(ymm0)
        : "m"(xmm0)
        : "xmm0", "ymm0");


更新2:这是我的完整代码(已添加vpbroadcastb)


Update 2: here is my full code (with vpbroadcastb added)

__m128 xmm0;
__m256 ymm0;

// write to XMM0, and read from YMM0
__asm__("movups %1, %%xmm0\n\t"
        "vpbroadcastb %%xmm0, %%ymm0\n\t"
        "vmovups %%ymm0, %0"
        : "=m"(ymm0)
        : "m"(xmm0)
        : "xmm0", "ymm0");

这个想法是我想将xmm0(变量)复制到XMM0,然后运行vpbroadcastb,然后将YMM0中的结果复制到ymm0(变量).现在我意识到XMM0是YMM0的下半部分,因此仍可以改进此代码吗?

The idea is that I want to copy xmm0 (variable) to XMM0, then run vpbroadcastb, then copy out the result in YMM0 to ymm0 (variable). Now I realize that XMM0 is a lower part of YMM0, so this code can still be improved?

推荐答案

第一步是#include <immintrin.h>,其中包括所需类型的所有定义以及所有

The first step is to #include <immintrin.h>, which includes all the definitions for the needed types as well as all the Intel Intrinsics for accessing all the MMX/SSE/AVX instructions. For most purposes, you want to use those intrinsics and not inline assembly, as they are clearer and more portable, but if you really want to use inline asm, you can use the intrinsic types (__m64, __m128, __m128d, __m256, etc) along with an x constraint to bind to the correct kind of xmm/ymm register.

这篇关于内联汇编代码以读取/写入XMM&amp; YMM寄存器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆