截断双精度数中的至少n个有效位 [英] truncating least n significant bits in a double precison number

查看:95
本文介绍了截断双精度数中的至少n个有效位的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

汇编代码就可以了.我想将其集成到我的embarcadero c ++多边形裁剪算法中.
我需要一个c + +函数,该函数需要一个双精度数字并从
中去除最后n个lsb位 一双.
我只是偶尔的c ++程序员.

解决方案

我怀疑您认为自己想要这样做的原因是因为您从双工中获得了比较错误.精确数学.

根据其产生的公式在数学上应等效的数字在实践中不等效.

剥离位是无法解决问题的捷径.它会介绍它自己的错误.

将误差范围应用于您的操作,是完成此工作的唯一可靠方法.

看一下此功能,看看它是如何完成的.

 布尔 AlmostEqualDoubles2( double  nVal1, double  nEpsilon)
{
 bool  bRet =((((nVal2-nEpsilon)< nVal1)&&(nVal1<(nVal2 + nEpsilon))));
返回 bRet;
} 


可靠的浮点数比较 [ #define AlmostEqualDoubles2(nVal1,nVal2,nEpsilon)\ (((((nVal2)-(nEpsilon))<(nVal1))&&(((nVal1)<((nVal2)+(nEpsilon)))));


是这样的吗?

我不确定您会发现它有用.

 //  StripDoubleBits.cpp:定义控制台应用程序的入口点.
//  

#include   "  stdafx.h"


 int  _tmain( int  argc,_TCHAR * argv [])
{
     int  nBits =  4 ;
     double  OriginalValue =  1 . 0 / 9 . 0 ;
     double  StrippedValue = OriginalValue;
    (((未签名  __ int32  *)& StrippedValue)[ 1 ]& =(0xFFFFFFFF<< nBits);

    printf(" ,((未签名  0 ],( (未签名  __ int32  *)& OriginalValue)[ 1 ]);
    printf(" ,((未签名  0 ],( (未签名  __ int32  *)& StrippedValue)[ 1 ]);

    printf(" ,OriginValue);
    printf(" ,StrippedValue);

    返回  0 ;
} 


我假设您正在使用MS编译器,但如果没有使用,则可能需要调整类型(long long是64位整数类型在MS 32位编译器中,这就是为什么我在这里使用它来匹配double大小的原因,但是原语的大小在编译器之间可能会有所不同.

  double  stripBits( int  n,){
      intVal = *(((  *)& value); // 将值转换为等效于二进制的整数类型

      mask =〜0; // 全1的起始掩码

    // 从末尾开始将n位设置为0.
     for (  i =  0 ; i< n; ++ i)
    {
        遮罩-=( 1 << i);
    }

    intVal& = mask; // 应用遮罩

    返回 *((( double  *)& intVal); // 返回值,掩码为double值
} 


assembly code will do.I want to integrate it into my embarcadero c++ polygon clipping algorithm.
I need a c++ function which takes a double precision number and strips of the last n lsb bits from
a double.
I am only a occasional c++ programmer.

解决方案

I suspect that the reason you think you want this, is because you''re getting comparison errors from artifacts in the double precision math.

Numbers that should be mathematically equivalent according to the formulas they were produced from, are not equivalent in practice.

Stripping bits is a shortcut that won''t solve the problem. It''ll introduce it''s own errors.

Applying error ranges to your operations, is the only reliable way to make this work.

Take a look at this function to see how it''s done.

bool AlmostEqualDoubles2(double nVal1, double nVal2, double nEpsilon)
{
	bool bRet = (((nVal2 - nEpsilon) < nVal1) && (nVal1 < (nVal2 + nEpsilon)));
	return bRet;
}


Reliable Floating Point Equality Comparison[^]

Or possibly..

#define AlmostEqualDoubles2(nVal1, nVal2, nEpsilon) \
    ((((nVal2) - (nEpsilon)) < (nVal1)) && ((nVal1) < ((nVal2) + (nEpsilon))));


Something like this?

I''m not so sure you''ll find it useful.

// StripDoubleBits.cpp : Defines the entry point for the console application.
//

#include "stdafx.h"


int _tmain(int argc, _TCHAR* argv[])
{
    int nBits = 4;
    double OriginalValue = 1.0 / 9.0;
    double StrippedValue = OriginalValue;
    ((unsigned __int32 *) &StrippedValue)[1] &= (0xFFFFFFFF << nBits);

    printf("Original Hex Value = %08X%08X\n", ((unsigned __int32 *) &OriginalValue)[0], ((unsigned __int32 *) &OriginalValue)[1]);
    printf("Stripped Hex Value = %08X%08X\n", ((unsigned __int32 *) &StrippedValue)[0], ((unsigned __int32 *) &StrippedValue)[1]);

    printf("Original Value = %0.17f\n", OriginalValue);
    printf("Stripped Value = %0.17f\n", StrippedValue);

    return 0;
}


I''m assuming you''re using the MS compiler, but if not you may have to adjust the types (long long is the 64-bit integer type in the MS 32-bit compiler, which is why I used it here to match the size of double, but the size of primitives can vary between compilers).

double stripBits(int n, double value) {
    long long intVal = *((long long*)&value); // Convert the value to the binary-equivalent integer type

    long long mask = ~0; // start mask at all 1's

    // set n bits to 0, starting at the end
    for (long long i = 0; i < n; ++i)
    {
        mask -= (1 << i);
    }

    intVal &= mask; // apply mask

    return *((double*)&intVal); // return value with mask applied as a double
}


这篇关于截断双精度数中的至少n个有效位的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆