整数到浮点数转换何时无损? [英] When is integer to floating point conversion lossless?

查看:58
本文介绍了整数到浮点数转换何时无损?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我特别想知道是否总是将 int32_t 无损地转换为 double .

以下代码是否总是返回 true ?

  int is_lossless(int32_t i){双倍d = i;int32_t i2 = d;返回(i2 == i);} 

int64_t 是什么?

解决方案

问题:以下代码是否总是返回true?

总是大声疾呼,因此答案是.

C ++标准没有提及C ++已知的浮点类型( float double long double )属于IEEE-754类型.该标准明确规定:

共有三种浮点类型: float double long double .double类型至少提供与float相同的精度,long double类型提供至少与double一样高的精度.float类型的值集是double类型的值集的子集;double类型的值集合是long double类型的值集合的子集.浮点类型的值表示形式是实现定义的. [注:本文档对浮点运算的准确性没有任何要求;另请参见[support.limits].—尾注] .整数和浮点类型统称为算术类型.标准库模板 std :: numeric_limits 的专业化应为实现指定每种算术类型的最大值和最小值.

来源:

并解码为:

但是,还有很多其他浮点格式解码方式不同,不一定具有与众所周知的IEEE-754相同的属性.尽管如此,它们是非常相似的:

  • 它们长 n
  • 一位代表符号
  • m 位表示有或没有隐藏的第一位的有效位
  • e 位表示给定基数(2或10)的某种形式的指数

要知道 double 是否可以表示所有32位带符号整数,您必须回答以下问题(假设浮点数以2为底):

  1. 我的浮点表示形式在有效位中是否有隐藏的第一位?如果是这样,则假设m = m + 1
  2. 32位带符号整数由1个符号位和31个代表数字的位表示.有效位足够大以容纳这31位吗?
  3. 指数是否足够大,可以表示形式为1.xxxxx 2 ^ 31的数字?

如果您对最后两个问题的回答是是",那么可以总是用在此特定系统上实现的 double 表示 int32 .

注意:,因为我对它们没有直接的了解,所以我忽略了 decimal32 decimal64 个数字.

Particularly I'm interested if int32_t is always losslessly converted to double.

Does the following code always return true?

int is_lossless(int32_t i)
{
    double   d = i;
    int32_t i2 = d;
    return (i2 == i);
}

What is for int64_t?

解决方案

Question: Does the following code always return true?

Always is a big statement and therefore the answer is no.

The C++ Standard makes no mention whether or not the floating-point types which are known to C++ (float, double and long double) are of the IEEE-754 type. The standard explicitly states:

There are three floating-point types: float, double, and long double. The type double provides at least as much precision as float, and the type long double provides at least as much precision as double. The set of values of the type float is a subset of the set of values of the type double; the set of values of the type double is a subset of the set of values of the type long double. The value representation of floating-point types is implementation-defined. [Note: This document imposes no requirements on the accuracy of floating-point operations; see also [support.limits]. — end note] Integral and floating-point types are collectively called arithmetic types. Specialisations of the standard library template std​::​numeric_­limits shall specify the maximum and minimum values of each arithmetic type for an implementation.

source: C++ standard: basic fundamentals

Most commonly, the type double represents the IEEE 754 double-precision binary floating-point format binary64, and can be depicted as:

and decoded as:

However, there is a plethora of other floating-point formats out there that are decoded differently and not necessarly have the same properties as the well known IEEE-754. Nonetheless, they are all-by-all similar:

  • They are n bits long
  • One bit represents the sign
  • m bits represent the significant with or without a hidden first bit
  • e bits represent some form of an exponent of a given base (2 or 10)

To know Whether or not a double can represent all 32-bit signed integer or not, you must answer the following question (assuming our floating-point number is in base 2):

  1. Does my floating-point representation have a hidden first bit in the significant? If so, assume m=m+1
  2. A 32bit signed integer is represented by 1 sign bit and 31 bits representing the number. Is the significant large enough to hold those 31 bits?
  3. Is the exponent large enough that it can represent a number of the form 1.xxxxx 2^31?

If you can answer yes to the last two questions, then yes a int32 can always be represented by the double that is implemented on this particular system.

Note: I ignored decimal32 and decimal64 numbers, as I have no direct knowledge about them.

这篇关于整数到浮点数转换何时无损?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆