实现浮点数相减 [英] Implementing floating point subtraction

查看:167
本文介绍了实现浮点数相减的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所有我试图实现一个浮点算术库,我无法理解减去浮动的算法。我已经成功实现了增加,我认为减法只是一个特例,但似乎我在某个地方犯了一个错误。
我在这里添加代码只是为了参考,它有许多自我解释的功能,但我不指望有人100%理解它。我想帮忙的是算法。我们遵循与添加浮点数相同的方法,除了当我们添加尾数时,我们将负数(我们减去的那个)转换成二进制补码,然后添加它们?

这就是我正在做的,但结果是不正确的。虽然它非常接近,但不一样。任何人有任何想法?提前致谢!



我相当肯定,我做事情的方式,因为我实现了一个几乎相同的算法来添加浮动,它的作品就像一个魅力。 >

  _float subFloat(_float f1,_float f2)
{
unsigned char diff;
_float结果;

//首先查看谁的指数较大
if(f1.float_parts.exponent> f2.float_parts.exponent)
{
diff = f1.float_parts。指数 - f2.float_parts.exponent;

//现在将f2的尾数与其右边的指数相差
//加上隐藏位
f2.float_parts.mantissa =((f2.float_parts.mantissa )>> 1)| (0×01<< 22);
f2.float_parts.mantissa>> =(int)(diff); // was(diff-1)

//也增加指数的差值
f2.float_parts.exponent = f2.float_parts.exponent + diff;
}
else if(f1.float_parts.exponent {
diff = f2.float_parts.exponent - f1.float_parts.exponent;
result = f1;
f1 = f2; //交换它们
f2 = result;

//现在将f2的尾数与其右边的指数相差
//加上隐藏位
f2.float_parts.mantissa =((f2.float_parts.mantissa )>> 1)| (0×01<< 22);
f2.float_parts.mantissa>> =(int)(diff);

//也可以通过差值增加指数
f2.float_parts.exponent = f2.float_parts.exponent + diff;
}
else //如果指数相等
f2.float_parts.mantissa =((f2.float_parts.mantissa)>> 1)| (0×01<< 22); //带出隐藏位




//获得f2尾数的二进制补码
f2.float_parts.mantissa ^ = 0x7FFFFF;
f2.float_parts.mantissa + = 0x01;



result.float_parts.exponent = f1.float_parts.exponent;
result.float_parts.mantissa =(f1.float_parts.mantissa + f2.float_parts.mantissa)>> 1;
//得到溢出位右移

//归一化
if(manBitSet(result,1))
result.float_parts.mantissa <= 1; //隐藏隐藏位
else
result.float_parts.exponent + = 1;

返回结果;

$ b


解决方案

加法代码是正确的,而你的减法不是,问题大概是在二进制补码和加法中。

有必要做两个补码和加法,而不是比减法?如果这不是问题,我在你的算法遇到麻烦。我做了这样的事情已经有一段时间了。你能提供一些细节吗?更具体地说,隐藏的位是什么?



对我来说隐藏位的处理似乎是适合加法而不是减法的。难道你应该把它放在f1尾数而不是f2?或否定f1尾数,而不是f2?

不知道自己在做什么,期望的是什么,以及所用算法的更多细节,这是我能做的最好的。



编辑:好的,我查看了评论中的参考文献。你在提供的代码中没有做的一件事是规范化。添加时,隐藏位溢出(向左移位尾数,增量指数),否则不会。当减去时,尾数的任意部分可以是零。在十进制中,考虑加0.5E1和0.50001E1;你会得到1.00001E1,如果你正常化,你会得到0.10001E2。当从0.50001E1减去0.5E1时,得到0.00001E1。然后,你需要将尾数转移到左边,并将指数减少到0.1E-4。


all I am trying to implement a floating point arithmetic library and I have trouble understanding the algorithm of subtracting floats. I have implemented addition succesfully and I thought that subtraction was just a special case of it but it seems I am making a mistake somewhere. I am adding the code here just for reference, it has many self explanatory functions but I don't expect someone to understand it 100%. What I would like help with is the algorithm. We follow the same method as with adding float numbers except, when we add the mantissas, we convert the negative one(the one we subtract) into two's complement and then add them?

That's what I am doing but the result is not correct. Albeit it is very close ... but not the same. Anyone has any ideas? Thanks in advance!

I am quite sure that the way I do things works since I implemented an almost identical algorithm for adding floats and it works like a charm.

_float subFloat(_float f1,_float f2)
{
unsigned char diff;
_float result;

//first see whose exponent is greater
if(f1.float_parts.exponent > f2.float_parts.exponent)
{
    diff = f1.float_parts.exponent - f2.float_parts.exponent;

    //now shift f2's mantissa by the difference of their exponent to the right
    //adding the hidden bit
    f2.float_parts.mantissa = ((f2.float_parts.mantissa)>>1) | (0x01<<22);
    f2.float_parts.mantissa >>= (int)(diff);//was (diff-1)

    //also increase its exponent by the difference shifted
    f2.float_parts.exponent = f2.float_parts.exponent + diff;
}
else if(f1.float_parts.exponent < f2.float_parts.exponent)
{
    diff = f2.float_parts.exponent - f1.float_parts.exponent;
    result = f1;
    f1 = f2;        //swap them
    f2 = result;

    //now shift f2's mantissa by the difference of their exponent to the right
    //adding the hidden bit
    f2.float_parts.mantissa = ((f2.float_parts.mantissa)>>1) | (0x01<<22);
    f2.float_parts.mantissa >>= (int)(diff);

    //also increase its exponent by the difference shifted
    f2.float_parts.exponent = f2.float_parts.exponent + diff;
}
else//if the exponents were equal
  f2.float_parts.mantissa = ((f2.float_parts.mantissa)>>1) | (0x01<<22); //bring out the hidden bit




//getting two's complement of f2 mantissa
f2.float_parts.mantissa ^= 0x7FFFFF;
f2.float_parts.mantissa += 0x01;



result.float_parts.exponent = f1.float_parts.exponent;
result.float_parts.mantissa = (f1.float_parts.mantissa +f2.float_parts.mantissa)>>1;
                                                //gotta shift right by overflow bits

//normalization
if(manBitSet(result,1))
    result.float_parts.mantissa <<= 1;  //hide the hidden bit
else
    result.float_parts.exponent +=1;

return result;

}

解决方案

If your addition code is correct, and your subtraction isn't, the problem is presumably in the two's complement and addition.

Is it necessary to do the two's complement and addition, rather than subtraction?

If that's not the problem, I'm having trouble with your algorithm. It's been a while since I did anything like this. Could you provide some details? More specifically, what is the hidden bit?

It seems possible to me that the handling of the hidden bit is proper for addition but not subtraction. Could it be that you should set it in the f1 mantissa rather than the f2? Or negate the f1 mantissa instead of the f2?

Without knowing what you're getting versus what you're expecting, and more details of the algorithm you're using, that's the best I can do.

Edit: OK, I looked at the references in your comment. One thing you are failing to do in the supplied code is normalization. When adding, either the hidden bits overflow (shift mantissa to the left, increment exponent), or they don't. When subtracting, arbitrary parts of the mantissa can be zero. In decimal, consider adding 0.5E1 and 0.50001E1; you'd get 1.00001E1 and if you were to normalize you'd get 0.10001E2. When subtracting the 0.5E1 from 0.50001E1, you get 0.00001E1. Then you need to shift the mantissa to the left and decrement the exponent by as much as it takes, to get 0.1E-4.

这篇关于实现浮点数相减的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆