FPTAN示例x86 [英] FPTAN Example x86

查看:157
本文介绍了FPTAN示例x86的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据Intel文档,这是FPTAN的作用:

According to Intel documentation, this is what FPTAN does:

用近似切线替换ST(0)并将1推到FPU堆栈上.

Replace ST(0) with its approximate tangent and push 1 onto the FPU stack.

这是我在NASM中编写的代码:

And this is a code I wrote in NASM:

section .data
    fVal: dd 4
    fSt0: dq 0.0
    fSt1: dq 0.0

section .text
    fldpi
    fdiv  dword[fVal]  ; divide pi by 4 and store result in ST(0).
    fptan
    fstp  qword[fSt0]  ; store ST(0)
    fstp  qword[fSt1]  ; store ST(1)

此时,我发现fSt0fSt1的值是:

At this point the values of fSt0 and fSt1, I find are:

fSt0 = 5.60479e+044
fSt1 = -1.#IND

但是,fSt0fSt1都不应该都是1吗?

But, shouldn't fSt0 and fSt1 be both 1?

推荐答案

正如Michael Petch在评论中指出的那样,您有一个简单的错字.您不必将fVal声明为浮点值(按预期方式),而是将其声明为32位整数.更改:

As Michael Petch has already pointed out in a comment, you have a simple typo. Instead of declaring fVal as a floating-point value (as intended), you declared it as a 32-bit integer. Change:

fVal: dd 4

收件人:

fVal: dd 4.0

然后您的代码将按预期工作.正确书写.

Then your code will work as intended. It is correctly written.

如果想要接受整数输入,则可以通过将代码更改为使用FIDIV指令来实现.该指令将首先将整数转换为双精度浮点值,然后进行除法:

If you wanted to take an integer input, you could do it by changing your code to use the FIDIV instruction. This instruction will first convert an integer to a double-precision floating-point value, and then do the divide:

fldpi
fidiv  dword [fVal]    ; st(0) = pi / fVal
fptan                  ; st(0) = tan(st(0))
                       ; st(1) = 1.0
fstp   qword [fSt0]
fstp   qword [fSt1]

但是,由于需要进行转换,因此,与仅将输入作为浮点值提供的情况相比,效率要低一些.

But because the conversion is required, this is slightly less efficient than if you had just given the input as a floating-point value.

请注意,如果要执行此操作,则在某些较旧的CPU上分散负载会更有效,这样就可以与分区( eg

Note that, if you were going to do this, it would be more efficient on certain older CPUs to break up the load so that it was done separately from the division—e.g.,

fldpi
fild   dword [fVal]
fdivp  st(1), st(0)    ; st(0) = pi / fVal
fptan                  ; st(0) = tan(st(0))
                       ; st(1) = 1.0
fstp   qword [fSt0]
fstp   qword [fSt1]

换句话说,我们将FIDIV指令分解为单独的FILD(整数加载)和FDIVP(分频弹出)指令.这改善了重叠,从而减少了代码执行速度的几个时钟周期. (在AMD系列15h [Bulldozer]和Intel Pentium II及更高版本的较新CPU上,将FIDIV分解为FILD + FDIV没有真正的优势;无论用哪种方式编写,它都应具有相同的性能.)

In other words, we break the FIDIV instruction apart into separate FILD (integer load) and FDIVP (divide-and-pop) instructions. This improves overlapping, and thus shaves off a couple of clock cycles from the execution speed of the code. (On newer CPUs, from AMD Family 15h [Bulldozer] and Intel Pentium II and later—there's no real advantage to breaking up FIDIV into FILD+FDIV; either way you write it should be equally performant.)

当然,由于这里的所有内容都是常量tan(pi/4) == 1,因此您的代码等效于:

Of course, since everything you have here is a constant, and tan(pi/4) == 1, your code is equivalent to:

fld1
fld1

…这是优化编译器将生成的内容. :-)

…which is what an optimizing compiler would generate. :-)

这篇关于FPTAN示例x86的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆