fma()如何实现 [英] How is fma() implemented

查看:306
本文介绍了fma()如何实现的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据文档,有一个 fma( ) math.h 中的函数。这是非常好的,我知道FMA是如何工作的,以及如何使用它。但是,我不太确定这是如何实施的?我最感兴趣的是 x86 x86_64 架构。



有没有FMA的浮点(非向量)指令,也许按照IEEE-754 2008的定义?

是否使用FMA3或FMA4指令?

/ b>

是否有一个内在的因素来确保使用真实的FMA,当依赖精度时?

解决方案

实际的实施方式因平台的不同而有所不同,但是非常广泛地讲:


  • 您告诉您的编译器使用硬件FMA指令(PowerPC,带有VFPv4或AArch64的ARM,Intel Haswell或AMD推土机及以后版本)的目标机器,编译器可以将调用替换为 fma (),只需将相应的指令放入代码即可。这是不能保证的,但通常是很好的做法。否则,您将调用数学库,并且:

  • 在具有硬件FMA的处理器上运行时,应该使用这些指令来实现函数。但是,如果您的操作系统版本较旧,或数学库版本较旧,则可能无法利用这些指令。

  • 你正在一个没有硬件FMA的处理器上运行,或者你正在使用一个较老的(或者不是很好的)数学库,那么将会使用一个FMA的软件实现。这可以使用巧妙的扩展精度浮点技巧或整数算法来实现。 函数应该始终正确舍入(即真正的fma)。如果不是,那么这是系统数学库中的一个错误。不幸的是, fma()是正确实现的更难的数学库函数之一,所以很多实现都有错误。请告知他们到您的图书馆供应商,以便他们得到修复!


    lockquote

    是否有内在以确保一个真正的FMA被使用,当精度是依赖?

    给一个好的编译器,这不应该是必要;使用 fma()函数就足够了,并告诉编译器你的目标是什么架构。但是,编译器并不完美,因此您可能需要使用x86上的 _mm_fmadd_sd()和相关的内在函数(但要向编译器供应商报告错误!)


    According to the documentation, there is a fma() function in math.h. That is very nice, and I know how FMA works and what to use it for. However, I am not so certain how this is implemented in practice? I'm mostly interested in the x86 and x86_64 architectures.

    Is there a floating-point (non-vector) instruction for FMA, perhaps as defined by IEEE-754 2008?

    Is FMA3 or FMA4 instruction used?

    Is there an intrinsic to make sure that a real FMA is used, when the precision is relied upon?

    解决方案

    The actual implementation varies from platform to platform, but speaking very broadly:

    • If you tell your compiler to target a machine with hardware FMA instructions (PowerPC, ARM with VFPv4 or AArch64, Intel Haswell or AMD Bulldozer and onwards), the compiler may replace calls to fma( ) by just dropping the appropriate instruction into your code. This is not guaranteed, but is generally good practice. Otherwise you will get a call to the math library, and:

    • When running on a processor that has hardware FMA, those instructions should be used to implement the function. However, if you have an older version of your operating system, or an older version of the math library, it may not take advantage of those instructions.

    • If you are running on a processor that does not have hardware FMA, or you are using an older (or just not very good) math library, then a software implementation of FMA will be used instead. This might be implemented using clever extended-precision floating-point tricks, or with integer arithmetic.

    • The result of the fma( ) function should always be correctly rounded (i.e. a "real fma"). If it is not, that's a bug in your system's math library. Unfortunately, fma( ) is one of the more difficult math library functions to implement correctly, so many implementations have bugs. Please report them to your library vendor so they get fixed!

    Is there an intrinsic to make sure that a real FMA is used, when the precision is relied upon?

    Given a good compiler, this shouldn't be necessary; it should suffice to use the fma( ) function and tell the compiler what architecture you are targeting. However, compilers are not perfect, so you may need to use the _mm_fmadd_sd( ) and related intrinsics on x86 (but report the bug to your compiler vendor!)

    这篇关于fma()如何实现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆