在GNU C内联汇编,该干嘛的修饰XMM / YMM / ZMM为一个操作? [英] In GNU C inline asm, what're the modifiers for xmm/ymm/zmm for a single operand?

查看:1197
本文介绍了在GNU C内联汇编,该干嘛的修饰XMM / YMM / ZMM为一个操作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在试图回答<一个href=\"http://stackoverflow.com/questions/34415238/embedded-broadcasts-with-intrinsics-and-assembly\">Embedded与内在和装配广播,我试图做这样的事情:

  __ M512 mul_broad(__ M512一,浮动B){
    INT划伤= 0;
    ASM(
        vbroadcastss%K [标]%Q [标] \\ n \\ t//想VBR ..%XMM0,%zmm0
        vmulps%Q [标]%[VEC]%[VEC]的\\ n \\ t的        //它是如何做了整数寄存器
        MOVW符号(%Q [inttmp]),%W [inttmp]的\\ n \\ t// MOVW符号(%RAX),斧头%
        movsbl%H [inttmp],%K [inttmp]的\\ n \\ t// MOVSX%啊,%EAX
        :[VEC]+ X(一),〔标量]+ X(b)中,[inttmp]= R(擦伤)
        :
        :
    );
    返回;
}

借助 GNU C 86操作数修饰符文档仅指定修饰符高达问:(DI(DoubleInt)大小,64位)。使用上一个向量寄存器将永远把它降低到 XMM (从青运 ZMM )。

的问题:

注册

什么是矢量的尺寸之间改变改性剂?

此外,是否有与输入或输出操作数使用任何特定的大小限制?东西比普通的其他X 可最终被XMM,青运,或取决于你把括号中的前pression类型ZMM。

题外话:结果
铛显得有一些 / YT 限制(未修饰),但我也不能找到在该文档。铛甚至不会编译这个,即使注释掉的向量指令,因为它不喜欢 + X 作为一个 __ M512约束载体。


背景/动机

我能得到我想要通过传递标量作为输入操作数,限制为同一个寄存器作为更广泛的输出操作数的结果,但它的笨拙。 (此用例的最大缺点是,AFAIK它具有使用中的操作数数,而不是 [symbolic_name] ,所以这是容易加入时/移除破损输出的限制。)

  //我想要做什么,通过采用配对输出和输入约束
__m512 mul_broad(__ M512一,浮动B){
    __m512 tmpvec;
    ASM(
        vbroadcastss%[标]%[tmpvec]的\\ n \\ t的
        vmulps%[tmpvec]%[VEC]%[VEC]的\\ n \\ t的
        :[VEC]+ X(一),[tmpvec]= x的(tmpvec)
        :[标量]1(二)
        :
    );  返回;
}

godbolt链接


另外,我觉得这整个的方法,我试图解决将是一个死胡同问题,因为的 不让你多替代约束给出了不同的约束模式不同的汇编。我希望有 X 研究限制最终发射 vbroadcastss 从寄存器,而 M 限制最终发射 vmulps(mem_src){1to16}%zmm_src2,%zmm_dst (折叠广播负载)。与内联汇编这样做的目的是,GCC还不知道如何折叠 SET1()内存操作数为广播负载(但不哗)。

不管怎样,这个具体问题是关于操作数修改器和约束矢量寄存器。请注重这一点,但评论和旁白的答案都是对等的问题欢迎。 (或更好,只是发表评论/在z玻色子的有关嵌入式广播问题的答案。)


解决方案

从文件<一个href=\"https://gcc.gnu.org/viewcvs/gcc/trunk/gcc/config/i386/i386.c?view=log\">gcc/config/i386/i386.c海合会来源:



       b - 打印寄存器的QImode名称所指示的操作数。
        %B0将打印%人如果操作数[0]是章第0。
       W - 同样,打印寄存器的HImode名称。
       期k - 同样,打印寄存器的SImode的名称。
       q - 同样,打印寄存器的DImode名称。
       点¯x - 同样,打印寄存器的V4SFmode名称。
       笔 - 同样,打印寄存器的V8SFmode名称。
       摹 - 同样,打印寄存器的V16SFmode名称。
       ^ h - 打印QImode名高寄存器,可以啊,BH,CH或DH。


同样从<一个href=\"https://gcc.gnu.org/viewcvs/gcc/trunk/gcc/config/i386/constraints.md?view=log\">gcc/config/i386/contraints.md:



    ;;我们使用Y preFIX表示任何数目的条件寄存器组:
    ;; z先将SSE寄存器。
    ;;我SSE2单位间移动到SSE寄存器启用
    ;; ĴSSE2单元间从SSE寄存器移动启用
    ;;米MMX单元间移动到MMX寄存器启用
    ;; ñMMX单元间从MMX寄存器移动启用
    ;;一个整数寄存器时零扩展与和禁用
    ;; p为整数寄存器当TARGET_PARTIAL_REG_STALL被禁用
    ;;当启用80387浮点运算˚F的x87寄存器
    ;;启用prefixes避免当r上证所的REG不需要REX preFIX
    ;;和所有SSE,否则的REG


此文件还定义了一个YK的约束,但我不知道如何,将在asm语句工作:



    (define_register_constraintYKTARGET_AVX512F MASK_EVEX_REGS:NO_REGS
    @internal可以用作predicate,即K1-K7不限屏蔽寄存器)。


请注意这是所有从最新的SVN版本复制。我不知道是什么GCC的版本中,如果有的话,特别是修改器和约束你有兴趣增加了。

While trying to answer Embedded broadcasts with intrinsics and assembly, I was trying to do something like this:

__m512 mul_broad(__m512 a, float b) {
    int scratch = 0;
    asm(
        "vbroadcastss  %k[scalar], %q[scalar]\n\t"  // want  vbr..  %xmm0, %zmm0
        "vmulps        %q[scalar], %[vec], %[vec]\n\t"

        // how it's done for integer registers
        "movw         symbol(%q[inttmp]), %w[inttmp]\n\t"  // movw symbol(%rax), %ax
        "movsbl        %h[inttmp], %k[inttmp]\n\t"  // movsx %ah, %eax
        : [vec] "+x" (a), [scalar] "+x" (b),  [inttmp] "=r" (scratch)
        :
        :
    );
    return a;
}

The GNU C x86 Operand Modifiers doc only specifies modifiers up to q (DI (DoubleInt) size, 64bits). Using q on a vector register will always bring it down to xmm (from ymm or zmm).

The question:

What are the modifiers to change between sizes of vector register?

Also, are there any specific-size constraints for use with input or output operands? Something other than the generic x which can end up being xmm, ymm, or zmm depending on the type of the expression you put in the parentheses.

Off-topic:
clang appears to have some Yi / Yt constraints (not modifiers), but I can't find docs on that either. clang won't even compile this, even with the vector instructions commented out, because it doesn't like +x as a constraint for an __m512 vector.


Background / motivation

I can get the result I want by passing in the scalar as an input operand, constrained to be in the same register as a wider output operand, but it's clumsier. (The biggest downside for this use-case is that AFAIK it has to use an operand-number, rather than the [symbolic_name], so it's susceptible to breakage when adding/removing output constraints.)

// does what I want, by using a paired output and input constraint
__m512 mul_broad(__m512 a, float b) {
    __m512 tmpvec;
    asm(
        "vbroadcastss  %[scalar], %[tmpvec]\n\t"
        "vmulps        %[tmpvec], %[vec], %[vec]\n\t"
        : [vec] "+x" (a), [tmpvec] "=x" (tmpvec)
        : [scalar] "1" (b)
        :
    );

  return a;
}

godbolt link


Also, I think this whole approach to the problem I was trying to solve is going to be a dead end because Multi-Alternative constraints don't let you give different asm for the different constraint patterns. I was hoping to have x and r constraints end up emitting a vbroadcastss from a register, while m constraints end up emitting vmulps (mem_src){1to16}, %zmm_src2, %zmm_dst (a folded broadcast-load). The purpose of doing this with inline asm is that gcc doesn't yet know how to fold set1() memory operands into broadcast-loads (but clang does).

Anyway, this specific question is about operand modifiers and constraints for vector registers. Please focus on that, but comments and asides in answers are welcome on the other issue. (Or better, just comment / answer on Z Boson's question about embedded broadcasts.)

解决方案

From the file gcc/config/i386/i386.c of the GCC sources:

       b -- print the QImode name of the register for the indicated operand.
        %b0 would print %al if operands[0] is reg 0.
       w --  likewise, print the HImode name of the register.
       k --  likewise, print the SImode name of the register.
       q --  likewise, print the DImode name of the register.
       x --  likewise, print the V4SFmode name of the register.
       t --  likewise, print the V8SFmode name of the register.
       g --  likewise, print the V16SFmode name of the register.
       h -- print the QImode name for a "high" register, either ah, bh, ch or dh.

Similarly from gcc/config/i386/contraints.md:

    ;; We use the Y prefix to denote any number of conditional register sets:
    ;;  z   First SSE register.
    ;;  i   SSE2 inter-unit moves to SSE register enabled
    ;;  j   SSE2 inter-unit moves from SSE register enabled
    ;;  m   MMX inter-unit moves to MMX register enabled
    ;;  n   MMX inter-unit moves from MMX register enabled
    ;;  a   Integer register when zero extensions with AND are disabled
    ;;  p   Integer register when TARGET_PARTIAL_REG_STALL is disabled
    ;;  f   x87 register when 80387 floating point arithmetic is enabled
    ;;  r   SSE regs not requiring REX prefix when prefixes avoidance is enabled
    ;;  and all SSE regs otherwise

This file also defines a "Yk" constraint but I don't know if how well it would work in an asm statement:

    (define_register_constraint "Yk" "TARGET_AVX512F ? MASK_EVEX_REGS : NO_REGS"
    "@internal Any mask register that can be used as predicate, i.e. k1-k7.")

Note this is all copied from the latest SVN revision. I don't know what release of GCC, if any, the particular modifiers and constraints you're interested in were added.

这篇关于在GNU C内联汇编,该干嘛的修饰XMM / YMM / ZMM为一个操作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆