关于浮点的一些问题 [英] Some questions about floating points

查看:147
本文介绍了关于浮点的一些问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道一个数字是否在浮点表示中用一种方式表示,是否在具有较大尺寸的表示中以相同的方式表示。
也就是说,如果一个数字有一个特殊的表示形式为 float ,如果 float 被转换为 double ,然后仍然相同,当转换为 long double

I'm wondering if a number is represented one way in a floating point representation, is it going to be represented in the same way in a representation that has a larger size. That is, if a number has a particular representation as a float, will it have the same representation if that float is cast to a double and then still the same when cast to a long double.

我想知道,因为我正在写一个BigInteger实现和任何浮点数传递给我发送到接受 long double 来转换它。这导致我的下一个问题。显然,浮点并不总是具有精确的表示,所以在我的BigInteger类中,当给定浮点数时,我应该尝试表示什么。是否合理地尝试并表示与 std :: cout ,即使这和传递的数字不一样。这是我能得到的最准确的表示吗?如果是这样,...

I'm wondering because I'm writing a BigInteger implementation and any floating point number that is passed in I am sending to a function that accepts a long double to convert it. Which leads me to my next question. Obviously floating points do not always have exact representations, so in my BigInteger class what should I be attempting to represent when given a float. Is it reasonable to try and represent the same number as given by std::cout << std::fixed << someFloat; even if that is not the same as the number passed in. Is that the most accurate representation I will be able to get? If so, ...

什么是提取该值的最好方法(在10的基础上),目前我只是把它作为一个字符串并将其传递给我的字符串构造函数。这将工作,但我不得不感到这是一个更好的方法,但肯定采取余数除以我的基地是不准确的浮点数。

What's the best way to extract that value (in base some power of 10), at the moment I'm just grabbing it as a string and passing it to my string constructor. This will work, but I can't help but feel theres a better way, but certainly taking the remainder when dividing by my base is not accurate with floats.

最后,我不知道是否有一个相当于 uintmax_t 的浮点数,这是一个类型名称,它将永远是系统上最大的浮点类型,或者没有点因为 long double 将始终是最大的(即使它与double相同)。

Finally, I wonder if there is a floating point equivalent of uintmax_t, that is a typename that will always be the largest floating point type on a system, or is there no point because long double will always be the largest (even if it 's the same as a double).

推荐答案

如果通过相同的表示法表示内存中除填充之外的完全相同的二进制表示双精度具有指数和尾数的更多位,并且还具有不同的指数偏差。但我相信任何单精度值都可以在双精度(除非可能反正规化的值)中可以表示。

If by "same representation" you mean "exactly the same binary representation in memory except for padding", then no. Double-precision has more bits of both exponent and mantissa, and also has a different exponent bias. But I believe that any single-precision value is exactly representable in double-precision (except possibly denormalised values).

我不确定你的意思,当你说浮点不总是具有确切的表示。当然,不是所有的十进制浮点值都有精确的二进制浮点值(反之亦然),但我不确定这是一个问题。只要你的浮点输入没有小数部分,那么一个适当大的BigInteger格式应该能够精确地表示它。

I'm not sure what you mean when you say "floating points do not always have exact representations". Certainly, not all decimal floating-point values have exact binary floating-point values (and vice versa), but I'm not sure that's a problem here. So long as your floating-point input has no fractional part, then a suitably large "BigInteger" format should be able to represent it exactly.

通过base-10表示不是去的方式。理论上,您需要的是一个长度为1024的位数组,将其初始化为零,然后将尾数位移入指数值。但是如果不了解更多关于您的实现,我还没有更多的建议!

Conversion via a base-10 representation is not the way to go. In theory, all you need is a bit-array of length ~1024, initialise it all to zero, and then shift the mantissa bits in by the exponent value. But without knowing more about your implementation, there's not a lot more I can suggest!

这篇关于关于浮点的一些问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆