是否存在这样libc中的功能FMA可以使用任何场景? [英] Is there any scenario where function fma in libc can be used?

查看:103
本文介绍了是否存在这样libc中的功能FMA可以使用任何场景?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到此页面并发现有一个奇怪的浮动乘法添加的功能 - FMA fmaf 。它说,结果是这样的:

I come across this page and find there is an odd floating multiply add function --fma and fmaf. It says that the result is something like:

 (x * y) + z             #fma(x,y,z)

和值的无限precision和圆形一次的结果格式的。

不过,AFAICT我以前从来没有见过这样的三元操作。所以我想知道什么是这个FUNC的cumstom使用。

However, AFAICT I've never seen such a ternary operation before. So I'm wondering what's the cumstom usage for this func.

推荐答案

的重要方面融合乘加指令是(几乎)无限$ P $中间结果的pcision。这有助于表现,但没有这么多,因为两个操作均设有连接在一个指令codeD - 它有助于性能,因为中间结果的几乎无限的precision有时是重要的,和非常贵与普通乘法和加法恢复时,这个级别precision的是真正的程序员追求的。

The important aspect of the fused-multiply-add instruction is the (virtually) infinite precision of the intermediate result. This helps with performance, but not so much because two operations are encoded in a single instruction — It helps with performance because the virtually infinite precision of the intermediate result is sometimes important, and very expensive to recover with ordinary multiplication and addition when this level of precision is really what the programmer is after.

假设这是一个关键的算法来确定,其中两个双precision数的乘积 A B 是相对于非零常数(我们将使用 1.0 )。数字 A B 都有二进制数字充分有效数。如果计算 A * B 双击,其结果可能是 1.0 ,但是,这并不告诉你真实的数学产品是否略低于1.0和四舍五入精确1.0,或略高于1.0和四舍五入。如果没有FMA,你的选择是:

Suppose that it is crucial to an algorithm to determine where the product of two double-precision numbers a and b is with respect to a nonzero constant (we'll use 1.0). The numbers a and b both have full significands of binary digits. If you compute a*b as a double, the result may be 1.0, but that does not tell you whether the actual mathematical product was slightly below 1.0 and rounded up to exactly 1.0, or slightly above 1.0 and rounded down. Without FMA, your options are:


  1. 计算 A * B 作为四核precision号。四核precision不是在硬件实现,但也有软件仿真库。在四核precision,该产品的数学结果是完全重新presentable,然后你可以把它比为1.0。

  1. compute a*b as a quad-precision number. Quad-precision is not implemented in hardware but there are software emulation libraries. In quad-precision, the mathematical result of the product is exactly representable and you can then compare it to 1.0.

计算 A * B 双precision在圆向上模式和圆向下模式。如果这两个结果是1.0,这意味着 A * B 正是1.0。如果RU(A * B)大于1.0,则意味着数学产物是高于1.0,并且如果RD(A * B)低于1.0,这意味着数学乘积低于1.0。在大多数的处理器,这种方式意味着要改变舍入模式三次,每变化是昂贵的(它涉及冲洗CPU流水线)。

Compute a*b in double precision in round-upward mode and in round-downward mode. If both results are 1.0, it means a*b is exactly 1.0. If RU(a * b) is greater than 1.0, it means the mathematical product is higher than 1.0, and if RD(a * b) is below 1.0, that means the mathematical product is lower than 1.0. On most processors, this approach means changing the rounding mode three times, and each change is expensive (it involves flushing the CPU pipeline).

随着FMA指令,可以计算出 FMA(A,B,-1.0)和比较的结果为0.0。因为浮点数是在零附近较密,而由于中间产物未在计算圆形的,我们可以肯定的是 FMA(A,B,-1.0)GT; 0 表示的数学乘积 A B 大于1,依此类推

With a FMA instruction, one can compute fma(a, b, -1.0) and compare the result to 0.0. Since floating-point numbers are denser around zero, and since the intermediate product is not rounded in the computation, we can be certain that fma(a, b, -1.0) > 0 means the mathematical product of a and b is greater than 1, and so on.

的<一个href=\"http://en.wikipedia.org/wiki/Quadruple-$p$pcision_floating-point_format#Double-double_arithmetic\"相对=nofollow>两双格式是数字的高效再presentation两个双precision浮点数字的总和。这是几乎precise为四核precision但需要现有的双precision硬件的优势。

The double-double format is an efficient representation of numbers as the sum of two double-precision floating-point numbers. It is nearly as precise as quad-precision but takes advantage of existing double-precision hardware.

考虑下面的函数, Mul12(A,B),即采用两个双precision号 A b ,并计算他们的产品为双双号。算法,由于Veltkamp和德克尔,计算这个功能只有双precision加法和乘法(<一个href=\"https://carolomeetsbarolo.word$p$pss.com/2012/02/13/the-veltkamp-dekker-route-to-extended-$p$pcision/\"相对=nofollow>引用)。它采用6乘法(一个是在算法主体的每个斯普利特)部分(加四),以及大量增加的。

Consider the following function, Mul12(a, b), that takes two double-precision numbers a and b and computes their product as a double-double number. An algorithm, due to Veltkamp and Dekker, computes this function with only double-precision addition and multiplication (reference). It takes 6 multiplications (one is part of each Split() plus four in the main body of the algorithm), and plenty of additions.

如果一个FMA指令是可用的, Mul12 可以实现为两个的操作,一次乘法和一个FMA。

If a FMA instruction is available, Mul12 can be implemented as two operations, one multiplication and one FMA.

high = a * b; /* double-precision approximation of the real product */
low = fma(a, b, -high); /* remainder of the real product */
/* now the real product of a and b is available as the sum of high and low */

更多示例

,其中,FMA用于其precision,不仅作为做乘法和加法指令,是平方根和除法的计算实施例。这些操作必须按照IEEE 754标准进行正确舍入(以数学结果最接近的浮点数)。这两种操作可以有效地当硬件FMA指令可实现。这方面通常是由编译链隐藏,但在IA-64指令集(安腾)没有除法的指令。取而代之的是,正确地舍入除法可由指令涉及FMA的序列(一般由编译器产生的)而得到。

More examples

Examples where FMA is used for its precision, and not only as an instruction that does a multiplication and an addition, are the computation of square root and division. These operations have to be correctly rounded (to the nearest floating-point number of the mathematical result) according to the IEEE 754 standard. These two operations can be implemented efficiently when a hardware FMA instruction is available. This aspect is typically hidden by the compilation chain, but the IA-64 instruction set (Itanium) did not have an instruction for division. Instead, the correctly rounded division could be obtained by a sequence of instructions (typically generated by the compiler) involving FMA.

这篇关于是否存在这样libc中的功能FMA可以使用任何场景?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆