找到浮点a最接近浮点b的倍数 [英] Find float a to closest multiple of float b

查看:128
本文介绍了找到浮点a最接近浮点b的倍数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

C ++方案:我有两个类型为double ab的变量.

C++ Scenario: I have two variables of type double a and b.

目标:a应设置为小于ab的最接近倍数.

Goal: a should be set to the closest multiple of b that is smaller than a.

第一种方法:使用fmod()remainder()获得r.然后执行a = a - r. 我知道,由于在存储器fmod()remainder()中使用十进制数字表示,因此永远无法保证100%的准确性.在测试中,我发现我根本无法使用fmod(),因为其结果的差异太不可预测了(至少据我所知).关于这个现象有很多问题和讨论. 因此,有什么我可以做的事以仍然使用fmod()吗? 某物"的意思是一些技巧,类似于通过使用double值

First approach: Use fmod() or remainder() to get r. Then do a = a - r. I know that due to the representation of decimal numbers in memory fmod() or remainder() can never guarantee 100% accuracy. In my tests I found that I cannot use fmod() at all, as the variance of its results is too unpredictable (at least as far as I understand). There are many questions and discussions out there talking about this phenomenon. So is there something I could do to still use fmod()? With "something" I mean some trick similar to checking if a equals b by employing a value double

EPSILON = 0.005;
if (std::abs(a-b) < EPSILON)
   std::cout << "equal" << '\n';

我的第二种方法行得通,但似乎不是很优雅.我只是从a中减去b,直到没有什么可减去的:

My second approach works but seems not to be very elegant. I am just subtracting b from a until there is nothing left to subtract:

double findRemainder(double x, double y) {
    double rest;
    if (y > x)
    {
        double temp = x;
        x = y;
        y = temp;
    }

    while (x > y)
    {
        rest = x - y;
        x = x - y;
    }
    return rest;
}

int main()
{ 
   typedef std::numeric_limits<double> dbl;
   std::cout.precision(dbl::max_digits10);
   double a = 13.78, b = 2.2, r = 0;
   r = findRemainder(a, b);
   return 0;
}

对我有什么建议吗?

推荐答案

序言

无论是陈述的还是预期的,问题都是不可能的.

Preamble

The problem is impossible, both as stated and as intended.

此陈述是错误的:"fmod()remainder()永远不能保证100%的准确性."如果浮点格式支持次正规数(如IEEE-754一样),则fmod(x, y)remainder都是精确的;它们产生的结果没有舍入错误(在实现中排除错误).正如为它们中的任何一个定义的,其余部分的大小始终小于y且不大于x.因此,浮点格式的一部分总是至少与yx一样好,因此实数算术余数所需的所有位都可以用浮点余数表示.因此,正确的实现将返回确切的余数.

This statement is incorrect: "fmod() or remainder() can never guarantee 100% accuracy." If the floating-point format supports subnormal numbers (as IEEE-754 does), then fmod(x, y) and remainder are both exact; they produce a result with no rounding error (barring bugs in their implementation). The remainder, as defined for either of them, is always less than y and not more than x in magnitude. Therefore, it is always in a portion of the floating-point format that is at least as fine as y and as x, so all the bits needed for the real-arithmetic remainder can be represented in the floating-point remainder. So a correct implementation will return the exact remainder.

为了简化说明,我将使用float常用的格式IEEE-754 binary32.其他格式的问题相同.在这种格式下,可以表示幅度最大为2 24 ,16,777,216的所有整数.之后,由于通过浮点指数进行缩放,可表示的值增加了两个:16,777,218、16,777,220,依此类推.在2 25 ,33,554,432,它们增加了四:33,554,436,33,554,440.在2 26 (67,108,864)处,它们增加了八.

For simplicity of illustration, I will use IEEE-754 binary32, the format commonly used for float. The issues are the same for other formats. In this format, all integers with magnitude up to 224, 16,777,216, are representable. After that, due to the scaling by the floating-point exponent, the representable values increase by two: 16,777,218, 16,777,220, and so on. At 225, 33,554,432, they increase by four: 33,554,436, 33,554,440. At 226, 67,108,864, they increase by eight.

100,000,000是可表示的,99,999,992和100,000,008也是可表示的.现在考虑询问3的倍数最接近100,000,000.它是99,999,999.但是99,999,999不能用binary32格式表示.

100,000,000 is representable, and so are 99,999,992 and 100,000,008. Now consider asking what multiple of 3 is the closest to 100,000,000. It is 99,999,999. But 99,999,999 is not representable in the binary32 format.

因此,函数不一定总是可以使用两个可表示的值ab,并使用相同的浮点数返回小于ab的最大倍数.格式.这不是因为计算倍数有任何困难,而仅仅是因为不可能以浮点格式表示真实的倍数.

Thus, it is not always possible for a function to take two representable values, a and b, and return the greatest multiple of b that is less than a, using the same floating-point format. This is not because of any difficulty computing the multiple but simply because it is impossible to represent the true multiple in the floating-point format.

实际上,给定标准库,可以很容易地计算出余数; std::fmod(100000000.f, 3.f)为1.但是无法以二进制32格式计算100000000.f-1.

In fact, given the standard library, it is easy to compute the remainder; std::fmod(100000000.f, 3.f) is 1. But it is impossible to compute 100000000.f1 in the binary32 format.

所示示例,a为13.78,b2.2,表明人们希望为某些浮点数ab生成倍数,这些浮点数是转换十进制数字的结果 a b 转换为浮点格式.但是,一旦执行了这样的转换,就不能从结果ab中得知原始数字.

The examples shown, 13.78 for a and 2.2 for b, suggest the desire is to produce a multiple for some floating-point numbers a and b that are the results of converting decimal numerals a and b to the floating-point format. However, once such conversions are performed, the original numbers cannot be known from the results a and b.

要查看此信息,请考虑 a 的值为99,999,997或100,000,002,而 b 的值为10.小于99,999,997的10的最大倍数为99,999,990,最大的倍数少于100,000,002的10分之一就是100,000,000.

To see this, consider values for a of either 99,999,997 or 100,000,002 while b is 10. The greatest multiple of 10 less than 99,999,997 is 99,999,990, and the greatest multiple of 10 less than 100,000,002 is 100,000,000.

当将99,999,997或100,000,002转换为binary32格式(使用通用方法,从最接近的关系到偶数)时,a的结果为100,000,000.当然,转换 b 会为b产生10.

When either 99,999,997 or 100,000,002 is converted to the binary32 format (using the common method, round-to-nearest-ties-to-even), the result for a is 100,000,000. Converting b of course yields 10 for b.

然后,转换a的最大倍数小于b的函数只能返回一个结果.即使此函数使用扩展精度(例如binary64),以便即使在binary32中无法表示的值也可以返回99,999,990或100,000,000,它也无法区分它们.不管原始的 a 是99,999,997还是100,000,002,赋予该功能的a都是100,000,000,所以它无法知道原始的 a ,也没有办法决定要返回哪个结果.

Then a function that converts the greatest multiple of a that is less than b can return only one result. Even if this function uses extended precision (say binary64) so that it can return either 99,999,990 or 100,000,000 even though those are not representable in binary32, it has no way to distinguish them. Whether the original a is 99,999,997 or 100,000,002, the a given to the function is 100,000,000, so there is no way for it to know the original a and no way for it to decide which result to return.

这篇关于找到浮点a最接近浮点b的倍数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆