双双精度浮点数为两倍的总和 [英] double-double precision floating point as sum of two doubles

查看:219
本文介绍了双双精度浮点数为两倍的总和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

遵循双重算术的论文和源代码有一段时间,我仍然无法找到一个dd_real(定义为 struct dd_real {double x [2]; ...} )数字是如何分成两个双打。假如我用一个字符串初始化它, dd_real pi =3.14159265358979323846264338327950; 将会是什么 pi.x [0] pi.xi [1] ?我需要了解它,然后写一个希望的小Python函数。



我不想调用QD库的原因是, d喜欢在Python中重新实现正确的分割,以便将35位精度常量(以字符串形式给出)作为 double2 发送到CUDA代码,由 GQD图书馆提供的双重实验 - 似乎是处理扩展精度的唯一图书馆CUDA中的暗示。不幸的是,在Python方面也排除了mpmath。

解决方案

/ code>与二进制数字:

  1.011010101111111010101010101010000000101010110110000111011111101010010101010 
< --- 52个二进制数字--->< ---更多数字--->

然后一个 double 将会是 1.0110101011111110101010101010100000001010101101100001
另一个将是 1.1011111101010010101010 * 2 ^ -53



当你添加这两个数字(如reals)时,总和就是初始值。第一个在52位尾数中包装尽可能多的位。第二个包含剩余的位,并带有适当的指数。

Following papers and source code for double-double arithmetic for some time, I still can't find out how exactly a dd_real ( defined as struct dd_real { double x[2];...}) number is split into two doubles. Say if I initialize it with a string, dd_real pi = "3.14159265358979323846264338327950"; what will be pi.x[0] and pi.xi[1]? I need to understand it and then write a hopefully small Python function that does it.

The reason I don't just want to call into the QD library is that I'd prefer to reimplement the correct split in Python so that I send my 35-digit precision constants (given as strings) as double2 to CUDA code where it will be treated as double-double reals by the GQD library -- the only library, it seems, to deal with extended precision caclulations in CUDA. That unfortunately rules out mpmath too, on Python side.

解决方案

Say that you initialize your double double with the binary number:

1.011010101111111010101010101010000000101010110110000111011111101010010101010
  < ---                 52 binary digits         --- >< --- more digits --- >

Then one double will be 1.0110101011111110101010101010100000001010101101100001 and the other will be 1.1011111101010010101010 * 2^-53

When you add these two numbers (as reals), the sum is the initial value. The first one packs as many bits as possible in its 52-bit mantissa. The second one contains the remaining bits, with the appropriate exponent.

这篇关于双双精度浮点数为两倍的总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆