对称的Lerp&编译器优化 [英] Symmetrical Lerp & compiler optimizations
问题描述
我有一个功能:
float lerp(float alpha, float x0, float x1) {
return (1.0f - alpha) * x0 + alpha * x1;
}
对于那些没有看过的人,这比x0 + (x1-x0)
* alpha
更可取,因为后者不能保证lerp(1.0f, x0, x1) == x1
.
For those who haven't seen it, this is preferable to x0 + (x1-x0)
* alpha
because the latter doesn't guarantee that lerp(1.0f, x0, x1) == x1
.
现在,我希望我的lerp
函数具有其他属性:我想要lerp(alpha, x0, x1) == lerp(1-alpha, x1, x0)
. (至于为什么:这是一个更复杂功能的玩具示例.)我想出的解决方案似乎是有效的
Now, I want my lerp
function to have an additional property: I'd like lerp(alpha, x0, x1) == lerp(1-alpha, x1, x0)
. (As for why: this is a toy example of a more complicated function.) The solution I came up with that seems to work is
float lerp_symmetric(float alpha, float x0, float x1) {
float w0 = 1.0f - alpha;
float w1 = 1.0f - w0;
return w0 * x0 + w1 * x1;
}
此双减法具有在零附近和一附近四舍五入的效果,因此,如果alpha = std::nextafter(0)
(1.4012985e-45),则1 - alpha == 1
依次为1 - (1-alpha) == 0
.据我所知,1.0f - x == 1.0f - (1.0f - (1.0f - x))
始终是正确的.它似乎也具有w0 + w1 == 1.0f
.
This double subtraction has the effect of rounding near zero and near one, so if alpha = std::nextafter(0)
(1.4012985e-45), then 1 - alpha == 1
and so 1 - (1-alpha) == 0
. As far as I can tell, it is always true that 1.0f - x == 1.0f - (1.0f - (1.0f - x))
. It also seems to have the effect that w0 + w1 == 1.0f
.
问题:
- 这是一种合理的方法吗?
- 我可以相信我的编译器可以执行我想要的吗?特别是,我知道在Windows上有时会为部分结果使用更高的精度,并且我知道编译器可以做一些代数运算.显然是1-(1-x)== x代数.
这在C ++ 11中使用Clang,VisualStudio和gcc.
This is in C++11 using Clang, VisualStudio, and gcc.
推荐答案
如果始终使用一种格式的IEEE-754二进制浮点数(例如,基本的32位二进制格式,则该格式通常用于C ++ float
) ,并且所有C ++运算符都以直接和简单的方式映射到IEEE-754操作,则lerp_symmetric(alpha, x0, x1)
(以下称为A
)等于lerp_symmetric(1-alpha, x1, x0)
(B
)
If one format of IEEE-754 binary floating-point is used throughout (e.g., basic 32-bit binary, the format commonly used for C++ float
), with all C++ operators mapped to IEEE-754 operations in the direct and simple way, then lerp_symmetric(alpha, x0, x1)
(hereafter referred to as A
) equals lerp_symmetric(1-alpha, x1, x0)
(B
)
证明:
- 如果我们假设在[0,1]中的
alpha
大于或等于½,那么根据斯特本的引理,1-alpha
是精确的. (精确"是指计算出的浮点结果等于数学结果;没有舍入误差.)然后,在计算A
时,w0
是精确的,因为它是1-alpha
和w1
是精确的,因为其数学结果为alpha
,因此可以精确表示.并且,在计算B
时,w0
是精确的,因为它的数学结果是alpha
,而w1
是精确的,因为它的数学结果是1-alpha
. - 如果
alpha
小于½,则1-alpha
可能会有一些舍入误差.令结果为beta
.然后,在A
中,w0
是beta
.现在½≤beta
,因此Sterbenz引理适用于w1 = 1.0f - w0
的求值,因此w1
是精确的(等于1-beta
的数学结果).并且,在B
中,w0
是精确的,再次由斯特本兹引理确定,等于A
的w1
,而w1
(B
的)是精确的,因为其数学结果是beta
,这完全可以表示.
- If
alpha
, which we assume is in [0, 1], is greater than or equal to ½, then1-alpha
is exact by Sterbenz’ lemma. (By "exact," we mean the computed floating-point result equals the mathematical resulting; there is no rounding error.) Then, in computingA
,w0
is exact since it is1-alpha
, andw1
is exact since its mathematical result isalpha
, so it is exactly representable. And, in computingB
,w0
is exact since its mathematical result isalpha
, andw1
is exact since it is again1-alpha
. - If
alpha
is less than ½, then1-alpha
may have some rounding error. Let the result bebeta
. Then, inA
,w0
isbeta
. Now ½ ≤beta
, so Sterbenz’ lemma applies to the evaluation ofw1 = 1.0f - w0
, sow1
is exact (and equals the mathematical result of1-beta
). And, inB
,w0
is exact, again by Sterbenz' lemma, and equals thew1
ofA
, andw1
(ofB
) is exact since its mathematical result isbeta
, which is exactly representable.
现在我们可以看到A
中的w0
等于B
中的w1
,并且A
中的w1
等于B
中的w0
.在上述两种情况下,假设beta
为1-alpha
,则A
和B
分别返回(1-beta)*x0 + beta*x1
和beta*x1 + (1-beta)*x0
. IEEE-754的添加是可交换的(NaN有效负载除外),因此A
和B
返回相同的结果.
Now we can see that w0
in A
equals w1
in B
and w1
in A
equals w0
in B
. Letting beta
be 1-alpha
in either of the above cases, A
and B
therefore return (1-beta)*x0 + beta*x1
and beta*x1 + (1-beta)*x0
, respectively. IEEE-754 addition is commutative (except for NaN payloads), so A
and B
return identical results.
回答问题:
-
我会说这是一种合理的方法.我不会断言没有进一步的思考就无法做出改进.
I would say it is a reasonable approach. I would not assert there are not improvements that could be made without further thought.
否,您不能信任您的编译器:
No, you cannot trust your compiler:
- C ++允许实现在评估浮点算术时使用过多的精度.因此,即使所有操作数均为
float
,也可以使用double
,long double
或其他精度来评估w0*x0 + w1*x1
. - 除非禁用C ++,否则C ++允许收缩,因此
w0*x0 + w1*x1
可能被评估为fmaf(w0, x0, w1*x1)
,因此对其中一个乘法使用了精确的算术,而对其他乘法则不使用.
- C++ allows implementations to use excess precision when evaluating floating-point arithmetic. Thus
w0*x0 + w1*x1
may be evaluated usingdouble
,long double
, or another precision even though all operands arefloat
. - C++ allows contractions unless disabled, so
w0*x0 + w1*x1
may be evaluated asfmaf(w0, x0, w1*x1)
, thus using exact arithmetic for one of the multiplications but not the other.
您可以使用以下方法部分解决此问题:
You can partially work around this by using:
float w0 = 1.0f - alpha;
float w1 = 1.0f - w0;
float t0 = w0*x0;
float t1 = w1*x1;
return t0+t1;
C ++标准要求在分配和强制转换中放弃多余的精度.这扩展到函数返回. (我从内存报告了此规范以及其他C ++规范;应该检查该标准.)因此,即使最初使用了更高的精度,上述每个结果也会将其结果四舍五入到float
.这样可以防止收缩.
The C++ standard requires that excess precision be discarded in assignments and casts. This extends to function returns. (I report this and other C++ specifications from memory; the standard should be checked.) So each of the above will round its result to float
even if extra precision was initially used. This will prevent contraction.
(还应该能够通过包含<cmath>
并插入预处理程序指令#pragma STDC FP_CONTRACT off
来禁用收缩.某些编译器可能不支持该压缩.)
(One should also be able to disable contraction by including <cmath>
and inserting the preprocessor directive #pragma STDC FP_CONTRACT off
. Some compilers might not support that.)
上述解决方法的一个问题是,值首先四舍五入为评估精度,然后四舍五入为float
.对于某些数学值,对于这样的值 x ,首先将 x 舍入为double
(或另一精度),然后舍入为float
会产生与以下结果不同的结果:将 x 直接四舍五入为float
.论文由Samuel A. Figueroa del Cid提出,它是一种完全支持高级编程语言中的浮点算法的IEEE标准的严格框架.Figueroa del Cid建立了评估IEEE-754中单个乘法或加法运算的方法.基本的64位浮点数(通常用于double
),然后四舍五入为32位格式永远不会出现双舍入错误(因为这些操作,在给定输入是32位格式的元素的情况下,绝对不能会产生上述麻烦的 x 值之一). 1
One problem with the workaround above is that values are first rounded to the evaluation precision and then rounded to float
. There are mathematical values for which, for such a value x, rounding x first to double
(or another precision) and then to float
produces a different result than rounding x directly to float
. The dissertation A Rigorous Framework for Fully Supporting the IEEE Standard for Floating-Point Arithmetic in High-Level Programming Languages by Samuel A. Figueroa del Cid establishes that evaluating a single operation of multiplication or addition in IEEE-754 basic 64-bit floating-point (commonly used for double
) and then rounding to the 32-bit format never has a double-rounding error (because these operations, given inputs that are elements of the 32-bit format, can never produce one of the troublesome x values described above).1
如果我对从内存中报告的C ++规范是正确的,那么只要C ++实现以名义格式或足够宽的格式满足浮点表达式要求的浮点表达式求值,上述解决方法就应该是完整的.德尔·西德给出.
If I am correct about the C++ specifications reported from memory, then the workaround describe above should be complete as long as the C++ implementation evaluates floating-point expressions either with the nominal format or with a format sufficiently wider to satisfy the requirements Figueroa del Cid gives.
1 根据Figueroa del Cid,如果x
和y
具有 p 位有效位,并且精确计算x+y
或x*y
,然后四舍五入到 q 地方,第二次四舍五入到 p 地方将具有相同的答案,就好像如果直接将结果四舍五入到 p 地方> p ≤( q - 1 )/2. IEEE-754基本的32位二进制浮点( p = 24)和64位( q = 53)可以满足此要求.这些格式通常用于float
和double
,并且上面的解决方法在使用它们的C ++实现中就足够了.如果C ++实现使用不满足Figueroa del Cid给出的条件的精度对float
进行评估,则可能会出现四舍五入错误.
1 Per Figueroa del Cid, if x
and y
have p-bit significands, and x+y
or x*y
is computed exactly and then rounded to q places, a second rounding to p places will have the same answer as if the result were directly rounded to p places if p ≤ (q − 1)/2. This is satisfied for IEEE-754 basic 32-bit binary floating-point (p = 24) and 64-bit (q = 53). These formats are commonly used for float
and double
, and the workaround above should suffice in a C++ implementation that uses them. If a C++ implementation evaluated float
using a precision that did not satisfy the condition Figueroa del Cid gives, then double-rounding errors could occur.
这篇关于对称的Lerp&编译器优化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!