使用 sqrt 和 floor 时的近似误差 [英] Approximation error when using sqrt and floor
问题描述
我必须枚举一个方程的解,并且我知道 y <x *( sqrt(n) - 1 )
,其中 x、y 和 n 是整数.
I have to do an enumeration of solutions of an equation and I know that y < x *( sqrt(n) - 1 )
, where x, y and n are integers.
我天真的方法是寻找 y 小于或等于 floor( x * ( sqrt( (float)n ) - 1 ) )
.
My naive approach would be to look for y less or equal than floor( x * ( sqrt( (float)n ) - 1 ) )
.
我应该担心近似误差吗?
Should I be worried about approximation error?
例如,如果我的表达式稍大于整数m,我是否应该担心最后得到m-1?
For example, if my expression is a little be greater than an integer m, should I be worried to get m-1 at the end?
我如何检测此类错误?
推荐答案
您绝对应该担心近似误差,但担心的程度取决于 x 和 n<的值范围/em> 你所关心的.
You should definitely be worried about approximation error, but how worried depends upon the ranges of values for x and n you are concerned about.
IEEE 4 字节浮点表示中的计算将有大约 2^23 到 2^24 中的一部分的错误;对于 8 字节表示(即 double
),它大约是 2^52 到 2^53 的一部分.您可以预期您需要使用 double
s 而不是 float
s 来获得 32 位整数 x 和 n,即使是 double
也不足以处理 64 位整数.
Calculations in IEEE 4-byte floating point representations are going to have errors roughly on the order of one part in 2^23 to 2^24; for an 8-byte representation (i.e, a double
), it will be roughly one part in 2^52 to 2^53. You could expect then that you would need to use double
s rather than float
s to get an accurate result for 32-bit integers x and n, and that even a double
would be insufficient for 64-bit integers.
以代码为例:
template <typename F,typename V>
F approxub(V x,V n) {
return std::floor(x*std::sqrt(F(n))-x);
}
uint64_t n=1000000002000000000ull; // (10^9 + 1)^2 - 1
uint64_t x=3;
uint64_t y=approxub<double>(x,n);
这给出了 y=3000000000 的值,但正确的值是 2999999999.
This gives a value of y=3000000000, but the correct value is 2999999999.
当 x 大而 n 小时,情况更糟:在 IEEE double
s 中不能完全表示大的 64 位整数:
It's even worse when x is large and n is small: large 64-bit integers are not exactly representable in IEEE double
s:
uint64_t n=9;
uint64_t x=5000000000000001111; // 5e18 + 1111
uint64_t y=approxlb<double>(x,n);
y 的正确值(将 n 何时是完全平方的问题放在一边——在这种情况下,真正的上限将减少 1)是 2 x = 10000000000000002222,即 1e19 + 2222.然而,计算出的 y 是 10000000000000004096.
The correct value for y (putting to one side the issue of when n is a perfect square — the true upper bound will be one less in this case) is 2 x = 10000000000000002222, i.e. 1e19 + 2222. The computed y is, however, 10000000000000004096.
假设您有一个函数 isqrt
,它精确地计算了整数平方根的整数部分.那你可以说
Suppose you had a function isqrt
which exactly computed the integer part of the square-root of an integer. Then you could say
y = isqrt(x*x*n) - x
并且假设乘积 x*x*n
适合您的整数类型,您将有一个精确的上限(如果 n是一个完美的正方形.)编写 isqrt
函数的方法不止一种;这是一个基于 code codex 资料的示例实现:
and provided that the product x*x*n
fit inside your integer type, you would have an exact upper bound (or one more than the upper bound if n is a perfect square.) There's more than one way to write an isqrt
function; this is an example implementation based on the material at code codex:
template <typename V>
V isqrt(V v) {
if (v<0) return 0;
typedef typename std::make_unsigned<V>::type U;
U u=v,r=0;
constexpr int ubits=std::numeric_limits<U>::digits;
U place=U(1)<<(2*((ubits-1)/2));
while (place>u) place/=4;
while (place) {
if (u>=r+place) {
u-=r+place;
r+=2*place;
}
r/=2;
place/=4;
}
return (V)r;
}
如果 x 对于这个来说太大了怎么办?例如,如果我们最大的整数类型有 64 位,并且 x 大于 2^32.最直接的解决方案是进行二分查找,将 xr - x 和 xr 作为边界,其中 r = [√n] 是整数平方根.
What if x is too large for this? For example, if our largest integral type has 64 bits, and x is larger than 2^32. The most straightforward solution would be to do a binary search, taking as bounds x r - x and x r, where r = [√n] is the integer square root.
这篇关于使用 sqrt 和 floor 时的近似误差的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!