数学解释为什么十进制的转换为双被打破和Decimal.GetHashCode分离等于实例 [英] Mathematical explanation why Decimal's conversion to Double is broken and Decimal.GetHashCode separates equal instances

查看:368
本文介绍了数学解释为什么十进制的转换为双被打破和Decimal.GetHashCode分离等于实例的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不知道是否说明一个堆栈溢出问题的这种非标准的方式是好还是坏,但这里有云:

I am not sure if this non-standard way of stating a Stack Overflow question is good or bad, but here goes:

什么是最好的(数学或否则技术)解释为什么代码:

What is the best (mathematical or otherwise technical) explanation why the code:

static void Main()
{
  decimal[] arr =
  {
    42m,
    42.0m,
    42.00m,
    42.000m,
    42.0000m,
    42.00000m,
    42.000000m,
    42.0000000m,
    42.00000000m,
    42.000000000m,
    42.0000000000m,
    42.00000000000m,
    42.000000000000m,
    42.0000000000000m,
    42.00000000000000m,
    42.000000000000000m,
    42.0000000000000000m,
    42.00000000000000000m,
    42.000000000000000000m,
    42.0000000000000000000m,
    42.00000000000000000000m,
    42.000000000000000000000m,
    42.0000000000000000000000m,
    42.00000000000000000000000m,
    42.000000000000000000000000m,
    42.0000000000000000000000000m,
    42.00000000000000000000000000m,
    42.000000000000000000000000000m,
  };

  foreach (var m in arr)
  {
    Console.WriteLine(string.Format(CultureInfo.InvariantCulture,
      "{0,-32}{1,-20:R}{2:X8}", m, (double)m, m.GetHashCode()
      ));
  }

  Console.WriteLine("Funny consequences:");
  var h1 = new HashSet<decimal>(arr);
  Console.WriteLine(h1.Count);
  var h2 = new HashSet<double>(arr.Select(m => (double)m));
  Console.WriteLine(h2.Count);
}



给出了以下搞笑(显然是不正确的)输出:

gives the following "funny" (apparently incorrect) output:

42                              42                  40450000
42.0                            42                  40450000
42.00                           42                  40450000
42.000                          42                  40450000
42.0000                         42                  40450000
42.00000                        42                  40450000
42.000000                       42                  40450000
42.0000000                      42                  40450000
42.00000000                     42                  40450000
42.000000000                    42                  40450000
42.0000000000                   42                  40450000
42.00000000000                  42                  40450000
42.000000000000                 42                  40450000
42.0000000000000                42                  40450000
42.00000000000000               42                  40450000
42.000000000000000              42                  40450000
42.0000000000000000             42                  40450000
42.00000000000000000            42                  40450000
42.000000000000000000           42                  40450000
42.0000000000000000000          42                  40450000
42.00000000000000000000         42                  40450000
42.000000000000000000000        41.999999999999993  BFBB000F
42.0000000000000000000000       42                  40450000
42.00000000000000000000000      42.000000000000007  40450000
42.000000000000000000000000     42                  40450000
42.0000000000000000000000000    42                  40450000
42.00000000000000000000000000   42                  40450000
42.000000000000000000000000000  42                  40450000
Funny consequences:
2
3

.NET下试过这个4.5.2。

Tried this under .NET 4.5.2.

推荐答案

Decimal.cs ,我们可以看到,的GetHashCode()作为本土实施码。此外,我们可以看到,中投以双击实施为 ToDouble(),这又是一个电话作为本机代码实现。因此,从那里,我们看不到对行为的合乎逻辑的解释。

In Decimal.cs, we can see that GetHashCode() is implemented as native code. Furthermore, we can see that the cast to double is implemented as a call to ToDouble(), which in turn is implemented as native code. So from there, we can't see a logical explanation for the behaviour.

在老的共享源代码CLI ,我们可以发现这些方法有希望带来了曙光,如果他们没有改变太多旧的实现。我们可以在comdecimal.cpp找到:

In the old Shared Source CLI, we can find old implementations of these methods that hopefully sheds some light, if they haven't changed too much. We can find in comdecimal.cpp:

FCIMPL1(INT32, COMDecimal::GetHashCode, DECIMAL *d)
{
    WRAPPER_CONTRACT;
    STATIC_CONTRACT_SO_TOLERANT;

    ENSURE_OLEAUT32_LOADED();

    _ASSERTE(d != NULL);
    double dbl;
    VarR8FromDec(d, &dbl);
    if (dbl == 0.0) {
        // Ensure 0 and -0 have the same hash code
        return 0;
    }
    return ((int *)&dbl)[0] ^ ((int *)&dbl)[1];
}
FCIMPLEND

FCIMPL1(double, COMDecimal::ToDouble, DECIMAL d)
{
    WRAPPER_CONTRACT;
    STATIC_CONTRACT_SO_TOLERANT;

    ENSURE_OLEAUT32_LOADED();

    double result;
    VarR8FromDec(&d, &result);
    return result;
}
FCIMPLEND

我们可以看到,在的GetHashCode()实施基于转换为双击:哈希码是基于转换后所产生的字节双击。它是基于等于小数值转换为等于双击值。

We can see that the the GetHashCode() implementation is based on the conversion to double: the hash code is based on the bytes that result after a conversion to double. It is based on the assumption that equal decimal values convert to equal double values.

因此,让我们试 VarR8FromDec .NET之外的系统调用:

So let's test the VarR8FromDec system call outside of .NET:

在德尔福(我实际使用FreePascal的)这里有一个简短的程序来调用系统功能直接以测试他们的行为:

In Delphi (I'm actually using FreePascal), here's a short program to call the system functions directly to test their behaviour:

{$MODE Delphi}
program Test;
uses
  Windows,
  SysUtils,
  Variants;
type
  Decimal = TVarData;
function VarDecFromStr(const strIn: WideString; lcid: LCID; dwFlags: ULONG): Decimal; safecall; external 'oleaut32.dll';
function VarDecAdd(const decLeft, decRight: Decimal): Decimal; safecall; external 'oleaut32.dll';
function VarDecSub(const decLeft, decRight: Decimal): Decimal; safecall; external 'oleaut32.dll';
function VarDecDiv(const decLeft, decRight: Decimal): Decimal; safecall; external 'oleaut32.dll';
function VarBstrFromDec(const decIn: Decimal; lcid: LCID; dwFlags: ULONG): WideString; safecall; external 'oleaut32.dll';
function VarR8FromDec(const decIn: Decimal): Double; safecall; external 'oleaut32.dll';
var
  Zero, One, Ten, FortyTwo, Fraction: Decimal;
  I: Integer;
begin
  try
    Zero := VarDecFromStr('0', 0, 0);
    One := VarDecFromStr('1', 0, 0);
    Ten := VarDecFromStr('10', 0, 0);
    FortyTwo := VarDecFromStr('42', 0, 0);
    Fraction := One;
    for I := 1 to 40 do
    begin
      FortyTwo := VarDecSub(VarDecAdd(FortyTwo, Fraction), Fraction);
      Fraction := VarDecDiv(Fraction, Ten);
      Write(I: 2, ': ');
      if VarR8FromDec(FortyTwo) = 42 then WriteLn('ok') else WriteLn('not ok');
    end;
  except on E: Exception do
    WriteLn(E.Message);
  end;
end.

请注意,由于Delphi和FreePascal的有任何浮点十进制类型没有语言支持,我呼叫系统的功能来执行计算。我设置 FortyTwo 首先 42 。我再加入 1 和减去 1 。我再加入 0.1 和减去 0.1 。等等。这将导致小数的精度进行扩展.NET中以同样的方式。

Note that since Delphi and FreePascal have no language support for any floating-point decimal type, I'm calling system functions to perform the calculations. I'm setting FortyTwo first to 42. I then add 1 and subtract 1. I then add 0.1 and subtract 0.1. Et cetera. This causes the precision of the decimal to be extended the same way in .NET.

和这里的(部分)的输出:

And here's (part of) the output:


...
20: ok
21: ok
22: not ok
23: ok
24: not ok
25: ok
26: ok
...

因此可见这的确是Windows中的老大难问题,仅仅发生在由.NET暴露出来。这是系统的功能被赋予平等的十进制值不同的结果,而不是他们应该是固定的,或.NET应该更改为不使用有缺陷的功能。

Thus showing that this is indeed a long-standing problem in Windows that merely happens to be exposed by .NET. It's system functions that are giving different results for equal decimal values, and either they should be fixed, or .NET should be changed to not use defective functions.

现在,在新的.NET的核心,我们可以在其 decimal.cpp 看代码来解决此问题:

Now, in the new .NET Core, we can see in its decimal.cpp code to work around the problem:

FCIMPL1(INT32, COMDecimal::GetHashCode, DECIMAL *d)
{
    FCALL_CONTRACT;

    ENSURE_OLEAUT32_LOADED();

    _ASSERTE(d != NULL);
    double dbl;
    VarR8FromDec(d, &dbl);
    if (dbl == 0.0) {
        // Ensure 0 and -0 have the same hash code
        return 0;
    }
    // conversion to double is lossy and produces rounding errors so we mask off the lowest 4 bits
    // 
    // For example these two numerically equal decimals with different internal representations produce
    // slightly different results when converted to double:
    //
    // decimal a = new decimal(new int[] { 0x76969696, 0x2fdd49fa, 0x409783ff, 0x00160000 });
    //                     => (decimal)1999021.176470588235294117647000000000 => (double)1999021.176470588
    // decimal b = new decimal(new int[] { 0x3f0f0f0f, 0x1e62edcc, 0x06758d33, 0x00150000 }); 
    //                     => (decimal)1999021.176470588235294117647000000000 => (double)1999021.1764705882
    //
    return ((((int *)&dbl)[0]) & 0xFFFFFFF0) ^ ((int *)&dbl)[1];
}
FCIMPLEND

这出现在当前的.NET框架中实现同样,基于事实的错误双击值之一并给予相同的散列码,但它不足以完全解决问题。

This appears to be implemented in the current .NET Framework too, based on the fact that one of the wrong double values does give the same hash code, but it's not enough to completely fix the problem.

这篇关于数学解释为什么十进制的转换为双被打破和Decimal.GetHashCode分离等于实例的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆