在C中int，float和double之间进行转换 [英] Casting between int, float and double in C

查看：213 发布时间：2020/11/8 22:10:07 c floating-point ieee

本文介绍了在C中int，float和double之间进行转换的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我不太了解C的类型转换.任何人都可以通过解决方案

unsigned int x;
unsigned char y;

y = 7;
x = (unsigned int) y;

当然，如果您执行x = y，则暗含；假设此处为8位和32位typedef.从位模式0x07变为0x00000007.

这实际上是一个非常有趣的练习.让我们编排格式，然后思考问题.

浮点格式通常会执行类似的操作，我们可以纯粹以正数来考虑，并且可以完成大部分练习.以基数10中的数字1234为例，我们谈论的是小学三年级的科学计数法，即为12的10乘以1.234乘以3的幂.基本两种计算机格式的工作方式相同，这有一个原因，即所谓的浮动点.小数点在虚拟意义上四处移动.因此，浮点数中的数字9 0b1001要找到最有效的数字，然后将十进制数乘以1.001乘以2的幂3.十七的数字1.b乘以2的幂乘以4.浮点必须覆盖符号，它必须为分数"或尾数添加一些位数，因为即使是整数，我们也要强制分数.然后是指数我们不一定需要保留某些格式中可以假定的小数点前的一位.当然，零是一个特殊的问题，浮点格式为此需要特殊的例外或模式.该格式还需要一些位数来表示所应用的2的幂.对于初学者，我们假设仅生成正整数，并且我们的整数是8位，因此0到0xFF是我们的整个整数世界.我们的double格式有12个小数位，为讨论目的，我们的single格式有5个小数.

那么最坏的情况是什么0xFF，我们可以轻松地用12个小数位1.111111100000乘以2乘以7来表示，并假设我们的格式具有足够多的指数位来覆盖整个练习.

所以我们的double可以容纳C中的每个整数

dx = double(x);

仅表示转换格式，如果我们从位模式00001001(即数字9)开始，并在我们组成的双精度数中将1.001000000000乘以2的幂3，则幂就是我们存储在格式中的更多位，但不相关这个问题.

在我们的组合单中，数字9是1.00100乘以2的幂3.

但是，单精度数字0xFF是1.11111乘以2的7的幂，当我们将其转换回整数时，它是0xFC而不是0xFF，我们在转换中丢失了一些位.假设没有舍入.以10为基数的1/27是0.0307307307 ...如果我们将其削减为4位数字，我们将得到0.0307，这有点低.但是如果我们将3 0.031取整，那有点太高了.如果我们在单帧中查看0xFF，则它是1.11111，接下来的两位被扔掉11，这是一半以上，因此，如果我们将10.00000 2乘以7取整，则归一化为1.00000 2乘以8的幂. 0xFF基本舍入为0x100.我们可以选择代表0xff的0x100或代表0xFF的0xFC有点高或有点低，但不完全精确，当我们转换回来时，我们要么有点高要么有点低.当您执行这些整数以进行浮点运算和向后转换时，这正是发生的情况.

所以看第一种情况A (float)x vs(float)((double)x) 1.00100乘以2的幂3与1.001000000000乘以2的幂3 覆盖float x vs double x，然后必须转换double，在浮点格式之间使用与用于float的整数相同的裁剪和舍入吗? 取决于硬件，但人们希望11111111转换为与1.111111100000相同，但也许不会，在理想的情况下会.

C是一个有趣的情况，将这样说，表示两位数字加两位数字需要多少位?正数最坏的情况3 + 3 = 6在我们的双精度格式中，多少位将0xFF加0xFF放大为多少位数，现在是12位以上吗?取0xFF加0xFF加0xFF这需要多少位?超过12点吗?重新安排分组会改变这一点吗?

D 2位数字3 * 3 = 9每个操作数输入2位，输出多少位? 0xFF乘以0xFF?那么0xFF乘以0xFF乘以0xFF是否需要超过12位?是第一个问题.第二个问题是否是这样，那么修剪和舍入如何工作?裁剪和舍入是否受分组影响.到目前为止，那是屈曲者，我可能不得不编写一个简单的程序来理解它.

E不如我最初想象的那么讨厌，重新阅读它.我将位模式除以确切的位模式.除分区外，我们除计算机最大的问题是什么，总的来说，什么整数给我们带来了问题，还有其他问题吗?如果我们在这里允许带正负号的数字为正x除以正x和负z除以负z怎么办?

因此，请在Wikipedia上搜索双精度浮点，然后搜索单精度.有多少个分数"位可以加倍?并假设32位或31和整数的符号是否适合所有有效数字?考虑正数，我们将具有2到31的幂，现在代表数字31需要很多指数位吗?是否有足够?那您可以在分数中保留31或30个有效位呢?您能在指数中代表正负31的幂吗?

编辑

因此，考虑案例D，我编写了一个程序.我用了8位小数

//5 9 f 020140 030120 0301E0 090151 090152
(5 * 9) * 15 vs 5 * (9 * 15)

所以5是1.01000000或0x1.40，9是1.00100000或0x1.20,0xF是1.11100000或0x1.E0.

float的乘积与您想像的不同，这会使您的大脑受到一点伤害，因为我们没有直接乘以5乘以9，所以我们将所有东西都移位了，所以是1.something，而是

0x120 * 0x140 = 0x16800

，由于这种归一化，您将8位截去，这就是我们将四舍五入，在这种情况下，结果是 0x168，不需要规范化

5*9 = 0x2D = 101101
1.01101000 0x1.68

我不需要关心指数，但它们只加5的指数为2，9的指数为8，所以结果为0x168，指数为5 所以0x168乘以0x1E0 = 0x2A300 由于乘法的性质，我们立即将8位0x2A3切掉. 现在我们归一化，我们需要1.soemthing而不是2.something，所以我们向右移动并增加指数，所以指数为5 + 3 = 8我们再给它9 但请注意0x2A3 0x1010100011，我们将扔掉一些精度而不是0x1.51，并且半个比特的基数2浮点的性质为0x1.51.现在应该将答案汇总起来吗?可能.如果是这样，则答案为0x1.52

Take it the other way 5*(9*15)
0x120*0x1E0 = 0x21C00
or 0x21C  1000011100
0x10E
0x140*0x10E = 0x15180
we are going to lose a bit here
is it 0x151 or 0x152?

这些等价的舍入问题吗?两条路径是否都导致0x152使它们相等，或者一个截取位的视图与另一个不同?如果我们根本不舍入而只是剪切，则两个答案均为0x152.

3 11 1f 010180 040110 0401F0 0A018B 0A018A
(3*17)*31 vs 3*(17*31)  no rounding just clipping
(3*17)*31
0x180*0x110 = 0x19800
0x198
0x198*0x1F0 = 0x31680  0x316 0x1100010110
0x18B
3*(17*31)
0x110*0x1F0 = 0x20f00
0x107
0x180*0x107 = 0x18A80
0x18A
0x18B != 0x18A

一条路径我们剪掉了两位，另一条路径只剪掉了一个.那公平吗? 整体上为0x31680

110001011010000000
110001011 010000000 with discarded bits on the right

因此，以这种方式看待01或010或0100的一半不足以四舍五入为3或2则舍入为1.33不会在基数10中舍入为1.4.

但为0x20F00

100000111100000000
100000111 100000000

就在中间点1/10、10/100、100/1000的一半处.

应该是0x108吗?

0x180*0x108 = 0x18C00
0x18C
0x18C != 0x18B

因此，以这种方式查看四舍五入后，仍然无法匹配排序.

也许您认为我在四舍五入，这很公平，如果可以，那么使所有可能的整数模式都能正常工作吗?假设int是32位，而double是IEEE754(具有52位尾数)，我们将溢出该位并必须将其切掉，这样会发生切分和舍入，排序重要吗?

I don't really understand casting in C. Can anyone help me with a question in the book Computer Systems: A Programmer's Perspective:

We generate arbitrary integer values x, y, and z, and convert them to values of type double as follows:

int x = random();
int y = random();
int z = random();

double dx = (double) x;
double dy = (double) y;
double dz = (double) z;

For each of the following C expressions, you are to indicate whether or not the expression always yields 1. If it always yields 1, describe the underlying mathematical principles. Otherwise, give an example of arguments that make it yields 0

A. (float) x == (float) dx
B. dx - dy == (double) (x-y)
C. (dx + dy) + dz == dx + (dy + dz)
D. (dx * dy) * dz == dx * (dy * dz)
E. dx / dx == dz / dz

解决方案

What casting is doing is converting from one thing to another. It might be an 8 bit integer to a 16 bit or 32 bit

unsigned int x;
unsigned char y;

y = 7;
x = (unsigned int) y;

which is implied of course if you did x=y; assuming 8 and 32 bit typedefs here. goes from the bit pattern 0x07 to 0x00000007.

This is actually a really fun exercise. Lets make up a format and then ponder the questions.

Floating point formats generally do something like this and we can think purely in positive numbers and work most of this exercise. Take the number 1234 in base 10, we talk about scientific notation in grade school that would be 1.234 times ten to the 3 power. base two computer formats work the same way there is a reason it is called FLOATING point. the decimal moves around in a virtual sense. So the number nine 0b1001 in float you would want to find the most significant one and put the decmal after it 1.001 times 2 to the power 3. the number seventeen 1.0001 times 2 to the power 4. Floating point has to cover the sign, it has to have some number of bits for the "fraction" or mantissa since we are forcing a fraction even with whole integer numbers. and then an exponent. we dont necessarily need to preserve the one before the decimal point that can be assumed in some formats. of course zero is a special problem and the floating point format needs a special exception or pattern for that. the format also needs some number of bits to represent the power of two that is applied. for starters lets assume only positive integers are being generated and our integers are 8 bits so 0 to 0xFF is our entire world of integers. Our double format has 12 fraction bits, our single has 5 for sake of argument.

so what is our worst case number 0xFF which we can easily represent with 12 fraction bits 1.111111100000 times 2 to the power 7 and lets assume that our format has more than enough exponent bits to cover this entire exercise.

so our double can hold every one of our integers, in C

dx = double(x);

just means convert formats if we started with the bit pattern 00001001 which is the number 9 and in our made up double that would be 1.001000000000 times 2 to the power 3 the power being some more bits we store in our format but not relevant to this question.

and in our made up single that number 9 is 1.00100 times 2 to the power 3.

but the number 0xFF in our single precision is 1.11111 times 2 to the power 7 when we convert that back to an integer it is 0xFC not 0xFF we lost some bits in the conversion. assuming no rounding. in base 10 1/27 is 0.0307307307...if we were to cut that off at 4 digits we would have 0.0307 which is a little bit low. but if we took 3 0.031 with rounding that is a little bit too high. if we look at 0xFF in our single it is 1.11111 with the next two bits being tossed are 11 which is more than a half so what if we rounded up 10.00000 times 2 to the 7th normalizes to 1.00000 times 2 to the 8th power. 0xFF rounded to 0x100 basically. we could go with 0x100 representing 0xff or 0xFC representing 0xFF a little high or a little low but not exact, when we convert back we are either a little high or a little low. that is exactly what is going on when you do these integer to float and back conversions.

so look at the first case A (float)x vs (float)((double)x) 1.00100 times 2 to the power 3 vs 1.001000000000 times 2 to the power 3. covering float x vs double x then the double has to be converted, is the same clipping and rounding used between floating point formats as integer to float? depends on the hardware but one would hope that 11111111 converts the same as 1.111111100000 but maybe it doesnt, in an ideal world it would.

C is an interesting case and will say it this way, how many bits does it take to represent a two bit number plus a two bit number? positive numbers worst case 3 + 3 = 6 how many bits scale that up 0xFF plus 0xFF now many bits is it more than 12 in our double format? take 0xFF plus 0xFF plus 0xFF how many bits does that take? is it more than 12? does re-arranging the grouping change that?

D two bit numbers 3*3 = 9 two bits in for each operand how many bits out? 0xFF times 0xFF? then 0xFF times 0xFF times 0xFF does that take more than 12 bits? Is the first question. Second question if so then how does the clipping and rounding work? Is the clipping and rounding affected by the grouping. That is the brain bender so far, I may have to write a simple program to understand it.

And E is not as nasty as I first thought, re-reading it. I am dividing the bit pattern with the exact bit pattern. What is our biggest problem with division not just computers but in general, what integer gives us a problem and do any others? what if we allow signed numbers in here positive x divided by positive x and negative z divided by negative z?

So search for double precision floating point at wikipedia then search for single precision. how many "fraction" bits are there for double? and assuming a 32 bit or 31 and a sign for an int do all the significant digits fit? thinking positive numbers we would have a power of 2 to the 31st now many exponent bits does it take to represent the number 31? is there enough? what about single can you hold 31 or 30 significant bits in the fraction? can you represent a power of plus or minus 31 in the exponent?

EDIT

so thinking about case D I wrote a program. I used 8 bits of fraction

//5 9 f 020140 030120 0301E0 090151 090152
(5 * 9) * 15 vs 5 * (9 * 15)

so 5 is 1.01000000 or 0x1.40, 9 is 1.00100000 or 0x1.20 and 0xF is 1.11100000 or 0x1.E0.

Multiplication in float is not the same as you would think, makes your brain hurt a little because we are not multiplying 5 times 9 directly we have shifted everything so that it is 1.something so it is instead

0x120 * 0x140 = 0x16800

and because of that normalization you chop off 8 bits and this is where I rounded as we will see, in this case the result is 0x168 and no normalization is needed

5*9 = 0x2D = 101101
1.01101000 0x1.68

I dont need to care about exponents but they just add 5 has an exponent of 2, 9 an exponent of 8 so the result is 0x168 with an exponent of 5 so 0x168 times 0x1E0 = 0x2A300 we instantly chop off 8 bits 0x2A3 due to the nature of multiplication. now we normalize we need 1.soemthing not 2.something so we shift right and increase the exponent so the exponent was 5+3=8 we give it one more 9 but notice something 0x2A3 0x1010100011 we are going to throw away a bit of precision instead of 0x1.51 and a half of a bit we have 0x1.51 the nature of base 2 floating point. Now is that supposed to round up the answer? perhaps. if so then the answer 0x1.52

Take it the other way 5*(9*15)
0x120*0x1E0 = 0x21C00
or 0x21C  1000011100
0x10E
0x140*0x10E = 0x15180
we are going to lose a bit here
is it 0x151 or 0x152?

And are these equivalent rounding questions? Do both paths result in 0x152 making them equal or is one view of chopping off bits different than the other? If we dont round at all and just clip both answers are 0x152.

3 11 1f 010180 040110 0401F0 0A018B 0A018A
(3*17)*31 vs 3*(17*31)  no rounding just clipping
(3*17)*31
0x180*0x110 = 0x19800
0x198
0x198*0x1F0 = 0x31680  0x316 0x1100010110
0x18B
3*(17*31)
0x110*0x1F0 = 0x20f00
0x107
0x180*0x107 = 0x18A80
0x18A
0x18B != 0x18A

one path we clipped off two bits the other path just one. Was that fair? the 0x31680 taken as a whole

110001011010000000
110001011 010000000 with discarded bits on the right

so looking at that that way 01 or 010 or 0100 is less than half that doesnt round up any more than a 3 or a 2 rounds up 1.33 does not round to 1.4 in base 10.

but 0x20F00

100000111100000000
100000111 100000000

that is right at the half way point 1/10, 10/100, 100/1000 that is a half.

should that have been 0x108?

0x180*0x108 = 0x18C00
0x18C
0x18C != 0x18B

so viewing rounding that way, it still does not match ordering made a difference.

Maybe you think I am doing rounding wrong and that is fair, and if so does that make all the possible integer patterns work? assuming an int is 32 bits and double is IEEE754 with 52 bits of mantissa we are going to overflow that and have to chop off bits so chopping and rounding will happen, does ordering matter?

这篇关于在C中int，float和double之间进行转换的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在C中int，float和double之间进行转换 [英] Casting between int, float and double in C

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在C中int，float和double之间进行转换 [英] Casting between int, float and double in C

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭