大会code FSQRT和FMUL指令 [英] Assembly code fsqrt and fmul instructions

查看:414
本文介绍了大会code FSQRT和FMUL指令的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想用汇编code来计算这个函数1.34 *开方(LGHT),但我越来越像错误:


  

_ ASM'未声明(第一次使用此功能使用)
     每个未声明的标识符为它出现在每个FUNC化报道只有一次
   预期';'前{标记



我一直在研究如何解决这个问题,但无法找到很多信息。有人建议的方式来得到这个工作?

我的code是:

 双hullSpeed​​(双LGTH){
       _asm {
           全球_start
           FLD LGHT; //负荷LGHT
           FLD ST(0); //复制LGHT上栈顶
           FSQRT;
           LGHT的平方根
           FLD ST(0); //负载堆栈的顶部方形结果
           FLD 1.34; //在堆栈的顶部装载1.34
           FLD ST(I)
           在TOS复制1.34
           fmulp ST(0),ST(I) //它们相乘
           FST Z者除外;
           将结果向z
       }
       返回Z者除外; //返回结果[1.34 *开方(LGHT)
   }


解决方案

它看起来像你正在尝试做一些类似的:

 的#include<&stdio.h中GT;双hullSpeed​​(双LGTH)
{
    双重效果;    __asm​​ __(
            fldl%1 \\ n \\ t// ST(0)=> ST(1),ST(0)= LGTH。 FLDL表示负载双精度浮点
            FSQRT \\ n \\ t// ST(0)=平方根ST(0)
            fmulp \\ n \\ t//乘以ST(0)和ST(1)(1.34)。导致ST(0)
            :=& T公司(结果):M(LGTH),0(1.34):ST(1));    返回结果
}诠释的main()
{
    的printf(%F \\ N,hullSpeed​​(64.0));
}

我使用的模板可以被简化,但为了示范的目的,就足够了。我们使用=& T公司约束,因为我们是在浮点堆栈的顶部在返回结果ST(0),而我们使用的符号的表示早撞(我们将使用浮点堆栈的顶部在1.34通过)。我们通过 LGTH 的地址通过约束M(LGTH)和存储器参考0(1.34)约束说,我们将在1.34通过在同一寄存器的参数为0,在这种情况下是浮点堆栈的顶部。我们还指定撞名单。这些寄存器(或存储器),我们的汇编将覆盖,但不显示为输入或输出的约束。在我们的例子中,我们将摧毁 ST(1)在计算过程中,所以我们增加ST(1)对撞名单。

使用内联汇编学习汇编语言是一件非常困难的学习方式。具体到本机的限制的 86 的可发现的这里下的 x86系列的。在约束修饰符的信息可以发现 rel=\"nofollow\">,并在 GCC 的扩展汇编模板可以发现这里

我只给你一个起点,为的 GCC 的的内联汇编使用量可能相当复杂,任何回答可能是出现StackOverflow的答案过于宽泛。您使用的的x87浮点内联汇编的事实使得它更加复杂。


一旦你有约束的手柄和修改另一个机制,由编译器产生更好的编译器code将是:

  __ __ ASM(
        FSQRT \\ n \\ t// ST(0)=平方根ST(0)
        fmulp \\ n \\ t//乘以ST(0)和ST(1)(1.34)。导致ST(0)
        := T(结果):0(LGTH),U(1.34):ST(1));

提示:约束U​​中的x87浮点寄存器放置一个值 ST(1)。汇编程序模板制约切实把 LGTH ST(0)和1.34 ST(1 )。我们使用约束放置为我们浮点堆栈上我们的价值观。这具有减少我们不得不汇编$ C $里面做C本身的工作的效果。


如果您正在开发64位应用程序,我强烈建议使用SSE / SSE2至少基本的浮点计算。在code以上的应在32位和64位工作。在64位code中的的x87浮点指令,一般不作为SSE ​​/ SSE2那样高效,但他们的工作。


使用内联汇编的x87四舍五入

如果您正试图围着基础上可以利用的x87四舍入模式之一code是这样的:

 的#include< stdint.h>
#定义RND_CTL_BIT_SHIFT 10的typedef枚举{
    ROUND_NEAREST_EVEN = 0℃;&下; RND_CTL_BIT_SHIFT,
    ROUND_MINUS_INF = 1&所述;&下; RND_CTL_BIT_SHIFT,
    ROUND_PLUS_INF = 2;&下; RND_CTL_BIT_SHIFT,
    ROUND_TOWARD_ZERO = 3';&下; RND_CTL_BIT_SHIFT
}的RoundingMode;双roundd(双N,与RoundingMode模式)
{
    uint16_t CW; / *存储当前的x87控制寄存器* /
    uint16_t newcw; / *存储的控制寄存器的新值* /
    uint16_t dummyreg;在模板中使用/ *临时虚拟寄存器* /    __asm​​ __(
            FSTCW%W [CW]的\\ n \\ t/ *读取当前的x87控制寄存器为CW * /
            FWAIT \\ n \\ t/ *做一个FSTCW指令后FWAIT * /
            MOV%W [CW]%W [Treg细胞] \\ n \\ t的顺时针变量/ * AX =值* /
            而$ 0xf3ff,%W [Treg细胞] \\ n \\ t的/ *设置舍入模式位10和控制11
                                            注册为零* /
            或%W [RMODE]%W [Treg细胞] \\ n \\ t的/ *设置舍入模式位* /
            MOV%W [Treg细胞]%W [newcw] \\ n \\ t/ * newcw =新的控制值章*值/
            FLDCW%W [newcw] \\ n \\ t的/ *设置控制寄存器newcw * /
            FRNDINT \\ n \\ t/ * ST(0)= ROUND(ST(0))* /
            FLDCW%W [CW]的\\ n \\ t/ *恢复控制章第一份原稿的价值CW * /
            :[CW]= M(CW),
              [newcw]= M(newcw)
              [Treg细胞]=安培; R(dummyreg)/ *注册与虚拟变量约束
                                         允许编译器选择可用的寄存器* /
              [N]+ T(N)
            :[RMODE]RMI((uint16_t)模式)); / *G​​的约束一样RMI* /    返回N;
}诠释的main()
{
    双dbHullSpeed​​ = hullSpeed​​(64.0);
    的printf(%F,%F \\ N,dbHullSpeed​​,roundd(dbHullSpeed​​,ROUND_NEAREST_EVEN));
    的printf(%F,%F \\ N,dbHullSpeed​​,roundd(dbHullSpeed​​,ROUND_MINUS_INF));
    的printf(%F,%F \\ N,dbHullSpeed​​,roundd(dbHullSpeed​​,ROUND_PLUS_INF));
    的printf(%F,%F \\ N,dbHullSpeed​​,roundd(dbHullSpeed​​,ROUND_TOWARD_ZERO));
    返回0;
}

当你在评论中指出的那样,在这个相当于code StackOverflow的答案但它使用多个 __ ASM __ 语句,你很好奇一个 __ ASM __ 语句怎么可能codeD。

舍入模式(0,1,2,3)可以在<一个找到href=\"http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf\"相对=nofollow>英特尔架构文档的:


  

舍入模式RC字段


  
  

00B圆角的结果是最接近无限precise结果。如果两个值是同样接近,其结果是偶数值(即,一个与零最低显著位)。默认本轮下跌(-∞朝)


  
  

01B圆角结果最接近但不大于无限precise结果大。农达(朝向+∞)


  
  

10B圆角结果最接近但不大于无限precise更少朝向零(截断)

result.Round
  
  

11B四舍五入的结果最接近,但不是无限precise结果的绝对值没有更大。


在节8.1.5(四舍五入节8.1.5.3具体描述的模式)存在的字段的描述。 4舍入模式在图4-8中第4.8.4规定。

I'm trying to compute 1.34 *sqrt(lght) in this function using assembly code, but I'm getting errors like:

'_asm' undeclared (first use in this function) each undeclared identifier is reported only once for each func tion it appears in expected ';' before '{' token


I have been researching how to solve this problem but can't find much information. Can someone suggest a way to get this to work?

My code is:

   double hullSpeed(double lgth) {
       _asm {
           global _start
           fld lght; //load lght
           fld st(0); //duplicate lght on Top of stack
           fsqrt;
           square root of lght
           fld st(0); //load square result on top of stack
           fld 1.34; //load 1.34 on top of stack
           fld st(i);
           duplicate 1.34 on TOS
           fmulp st(0), st(i); //multiply them 
           fst z;
           save result in z
       }
       return z; // return result of [ 1.34 *sqrt(lght) ]
   }

解决方案

It looks like you are trying to do something similar to this:

#include <stdio.h>

double hullSpeed(double lgth)
{
    double result;

    __asm__(
            "fldl %1\n\t" //st(0)=>st(1), st(0)=lgth . FLDL means load double float
            "fsqrt\n\t"   //st(0) = square root st(0)
            "fmulp\n\t"   //Multiplies st(0) and st(1) (1.34). Result in st(0)
            : "=&t" (result) : "m" (lgth), "0" (1.34) : "st(1)");

    return result
}

int main()
{
    printf ("%f\n", hullSpeed(64.0));
}

The template I used can be simplified, but for demonstration purposes it will suffice. We use "=&t" constraint since we are returning the result at the top of the floating point stack in st(0), and we use ampersand to denote early clobber (we'll be using the top of the floating point stack to pass in 1.34). We pass the address of lgth with a memory reference via the constraint "m" (lgth), and the "0"(1.34) constraint says we will pass in 1.34 in the same register as parameter 0, which in this case is the top of the floating point stack. We also specify the clobber list. These are registers(or memory) that our assembler will overwrite but don't appear as an input or output constraint. In our case we will be destroying st(1) during the calculations, so we add "st(1)" to the clobber list.

Learning assembly language with inline assembler is a very difficult way to learn. The machine constraints specific to x86 can be found here under x86 family. Information on the constraint modifiers can be found here, and information on GCC extended assembler templates can be found here.

I'm only giving you a starting point, as GCC's inline assembler usage can be rather complex and any answer may be too broad for a Stackoverflow answer. The fact you are using inline assembler with x87 floating point makes it that much more complex.


Once you have a handle on constraints and modifiers another mechanism that would yield better assembler code by the compiler would be:

__asm__(
        "fsqrt\n\t"   // st(0) = square root st(0)
        "fmulp\n\t"   // Multiplies st(0) and st(1) (1.34). Result in st(0)
        : "=t"(result) : "0"(lgth), "u" (1.34) : "st(1)" );

Hint: Constraint "u" places a value in x87 floating point register st(1). The assembler template constraints effectively place lgth in st(0) and 1.34 in st(1). We use the constraints to place our values on the floating point stack for us. This has the effect of reducing the work we have to do inside the assembler code itself.


If you are developing 64-bit applications I highly recommend using SSE/SSE2 at a minimum for basic floating point calculations. The code above should work on 32-bit and 64-bit. In 64-bit code the x87 floating point instructions are generally not as efficient as SSE/SSE2, but they will work.


Rounding with Inline Assembly and x87

If you are attempting to round based on one of the 4 rounding modes on the x87 you can utilize code like this:

#include <stdint.h>
#define RND_CTL_BIT_SHIFT   10

typedef enum {
    ROUND_NEAREST_EVEN =    0 << RND_CTL_BIT_SHIFT,
    ROUND_MINUS_INF =       1 << RND_CTL_BIT_SHIFT,
    ROUND_PLUS_INF =        2 << RND_CTL_BIT_SHIFT,
    ROUND_TOWARD_ZERO =     3 << RND_CTL_BIT_SHIFT
} RoundingMode;

double roundd (double n, RoundingMode mode)
{
    uint16_t cw;        /* Storage for the current x87 control register */
    uint16_t newcw;     /* Storage for the new value of the control register */
    uint16_t dummyreg;  /* Temporary dummy register used in the template */

    __asm__(
            "fstcw %w[cw]          \n\t" /* Read current x87 control register into cw*/
            "fwait                 \n\t" /* Do an fwait after an fstcw instruction */
            "mov %w[cw],%w[treg]   \n\t" /* ax = value in cw variable*/
            "and $0xf3ff,%w[treg]  \n\t" /* Set rounding mode bits 10 and 11 of control
                                            register to zero*/
            "or %w[rmode],%w[treg] \n\t" /* Set the rounding mode bits */
            "mov %w[treg],%w[newcw]\n\t" /* newcw = value for new control reg value*/
            "fldcw %w[newcw]       \n\t" /* Set control register to newcw */
            "frndint               \n\t" /* st(0) = round(st(0)) */
            "fldcw %w[cw]          \n\t" /* restore control reg to orig value in cw*/
            : [cw]"=m"(cw),
              [newcw]"=m"(newcw),
              [treg]"=&r"(dummyreg),  /* Register constraint with dummy variable
                                         allows compiler to choose available register */
              [n]"+t"(n)
            : [rmode]"rmi"((uint16_t)mode)); /* "g" constraint same as "rmi" */

    return n;
}

int main()
{
    double dbHullSpeed = hullSpeed(64.0);
    printf ("%f, %f\n", dbHullSpeed, roundd(dbHullSpeed, ROUND_NEAREST_EVEN));
    printf ("%f, %f\n", dbHullSpeed, roundd(dbHullSpeed, ROUND_MINUS_INF));
    printf ("%f, %f\n", dbHullSpeed, roundd(dbHullSpeed, ROUND_PLUS_INF));
    printf ("%f, %f\n", dbHullSpeed, roundd(dbHullSpeed, ROUND_TOWARD_ZERO));
    return 0;
}

As you pointed out in the comments, there was equivalent code in this Stackoverflow answer but it used multiple __asm__ statements and you were curious how a single __asm__ statement could be coded.

The rounding modes (0,1,2,3) can be found in the Intel Architecture Document:

Rounding Mode RC Field

00B Rounded result is the closest to the infinitely precise result. If two values are equally close, the result is the even value (that is, the one with the least-significant bit of zero). Default Round down (toward −∞)

01B Rounded result is closest to but no greater than the infinitely precise result. Round up (toward +∞)

10B Rounded result is closest to but no less than the infinitely precise result.Round toward zero (Truncate)

11B Rounded result is closest to but no greater in absolute value than the infinitely precise result.

In section 8.1.5 (rounding mode specifically described in section 8.1.5.3) there is a description of the fields. The 4 rounding modes are defined in figure 4-8 under section 4.8.4.

这篇关于大会code FSQRT和FMUL指令的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆