编译为64位时，是什么原因导致的FP精度显著的损失？ [英] What causes significant loss of FP precision when compiling for 64-bit?

查看：113 发布时间：2016/9/20 10:25:06 c# visual-studio-2013 floating-point

本文介绍了编译为64位时，是什么原因导致的FP精度显著的损失？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

平台：使用Visual Studio 2013的C＃

Platform: C# using Visual Studio 2013.

我有一种与正常工作64位Haswell的CPU上运行的Windows应用程序身高32位启用。我决定升级到不想64位通过取消选中身高32位和应用程序的运算突然改口不正确的值。 我失去了计算精度29位（这是一个双精度浮点尾数和一个单精度浮点尾数大小我差的估计）。！这里的运算精度的差异是巨大的。

I had a Windows Application running on a 64-bit Haswell CPU that was working correctly with ‘Prefer 32-bit’ enabled. I decided to upgrade to ‘Prefer 64-bit’ by de-selecting ‘Prefer 32-bit’ and the Application’s arithmetic changed suddenly to incorrect values. I LOST 29 BITS OF ARITHMETIC PRECISION (that’s my estimate of the difference in size of a Double-Precision Floating-Point mantissa and a Single-Precision Floating-Point mantissa). The difference in arithmetic precision here is massive!

C＃代码...测试用例：

using System;
class lngfltdbl
{
    static void Main()
    {
        long   lng = 2026872;
        float  flt = 0.3F;
        double dbl = lng + flt;
        Console.WriteLine(dbl);
    }
}

预期结果（当'不想32位看到被选中）：

Expected result (seen when ‘Prefer 32-bit’ is selected):

dbl == 2026872.30000001
(PERFECT! CORRECT to 14 decimal places)

得到的结果（当'不想32位看到被取消选择）：

Obtained Result (seen when ‘Prefer 32-bit’ is de-selected):

dbl == 2026872.25
(ERROR!  CORRECT to 7 DECIMAL PLACES ONLY!)

请注意：过去我一直舒服的隐式转换，因为不想32位始终明白如何正确地结合不同的精度值

Please note: in the past I have been comfortable with implicit casts since 'Prefer 32-bit' always understood how to combine correctly values of differing precision.

推荐答案

当错误在于：

在专家的协助下，我们观察到，生成的汇编代码身高32位'确实使用单精度指令取消（cvtsi2ss; subss）进行计算，然后将结果转换为双精度（cvtss2sd：标量双精度浮点值转换为Scalar双精度FP值），最后的结果存储在双精度变量（MOVSD）。这正好与检测到的错误的症状相符，并解释的算术精度29位的损失。

With expert assistance, we observed that the assembly code produced with ‘Prefer 32-bit’ deselected is indeed using Single Precision instructions (cvtsi2ss; subss) for computation, then the result is converted to double-precision (cvtss2sd : Convert Scalar Double-Precision FP value to Scalar Double-Precision FP value) and finally the result is stored in a Double Precision variable (movsd). This matches exactly with the symptoms of the detected error and explains the loss of 29 bits of arithmetic precision.

我升级这微软终于拨通了有人在JIT -compiler团队。它原来是故意行为，也就是说，如果采用双精度浮点运算隐式类型转换，机会是你必须修改你的C＃代码。到现在为止我认为，计算精度完全依赖变量和任何显式/隐式转换（由IEEE当然规定，浮点运算的规则范围内）的长度。此外，我认为，选择编译工作的32位应用程序，64位不会改变应用程序的行为。

I escalated this to Microsoft and finally got through to someone in the JIT-compiler team. It turned out to be intentional behavior, i.e. if using Double-Precision Floating-Point arithmetic with implicit type casts, the chances are you MUST MODIFY your C# code. Up to now I believed that arithmetic precision relies solely on the length of variables and any explicit/implicit conversions (within the rules of floating point computation defined by IEEE, of course). Furthermore I believed that the choice to compile a working 32-bit application as 64-bit would not change the application behaviour.

我要感谢微软给我寄来以下回应...

I am indebted to Microsoft for sending me the following response…

的您看到的是预期的行为为您提供的具体测试案例。这里的关键是表达的

lng + flt

的 C＃编译器生成IL来评估这个表达式。它并不考虑你所指派该表达式。在隐式转换你的表达和分配依靠插入表达式。 C＃编译器有一个指定它是如何将添加的隐式转换成表达式时，它产生IL的表达式规则。在这种情况下，C＃编译器添加的隐式转换是这样的：的

((float)lng + flt)

的这个表达式告诉JIT编译器，它应为单精度代码生成浮点ADD操作。所以给出的IL JIT编译器被赋予了64位的目标生成的代码是完全合适的。有人告诉（由IL）来计算的32位大小float结果，这就是它的所作所为，你观察到的。的

的这里是IL此方法：的

.method private hidebysig static void  Main() cil managed
{
  .entrypoint
  // Code size       26 (0x1a)
  .maxstack  2
  .locals init (int64 V_0,
           float32 V_1,
           float64 V_2)
  IL_0000:  ldc.i4     0x1eed78
  IL_0005:  conv.i8
  IL_0006:  stloc.0
  IL_0007:  ldc.r4     0.30000001
  IL_000c:  stloc.1
  IL_000d:  ldloc.0
  IL_000e:  conv.r4    ;; Force the conversion of ‘lng’ into a 32-bit float ‘r4’
  IL_000f:  ldloc.1
  IL_0010:  add
  IL_0011:  conv.r8
  IL_0012:  stloc.2
  IL_0013:  ldloc.2
  IL_0014:  call void [mscorlib]System.Console::WriteLine(float64)
  IL_0019:  ret
} // end of method lngfltdbl::Main

的就变成了一个问题：为什么没有32位的目标JIT产生不同的（更精确）结果如何？的

的这里的答案是旧的32位使用较早的x87风格的指令，我们一直表示，JIT编译器可以以更高的精度计算用于表达式中间浮点值。 32位JIT编译器事实上确实计算32位浮点表达式以更高的精度。它这样做是因为使用旧的x87样式指令时，可用指令的自然行为。我们这样做是因为有一个相当大的性能损失来执行使用的x87指令式的32位浮点运算。而我们的文件，如果你需要一个32位浮点结果的中间计算，你可以添加一个显式类型转换的转换和JIT要求变化的精度为32位浮点当它看到了显式类型转换的转换。的

的对于你的情况，你需要为了增加一个显式类型转换为双在任的ADD指令两个操作数为C＃编译器生成IL，增加了两个64位浮点数的

的无论这些源表达式将计算你想要的结果：的

((double)lng + flt)
(lng + (double)flt)

这篇关于编译为64位时，是什么原因导致的FP精度显著的损失？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

编译为64位时，是什么原因导致的FP精度显著的损失？ [英] What causes significant loss of FP precision when compiling for 64-bit?

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

编译为64位时，是什么原因导致的FP精度显著的损失？ [英] What causes significant loss of FP precision when compiling for 64-bit?

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

登录关闭