数学解释为什么十进制的转换为双被打破和Decimal.GetHashCode分离等于实例 [英] Mathematical explanation why Decimal's conversion to Double is broken and Decimal.GetHashCode separates equal instances
问题描述
我不知道是否说明一个堆栈溢出问题的这种非标准的方式是好还是坏,但这里有云:
I am not sure if this non-standard way of stating a Stack Overflow question is good or bad, but here goes:
什么是最好的(数学或否则技术)解释为什么代码:
What is the best (mathematical or otherwise technical) explanation why the code:
static void Main()
{
decimal[] arr =
{
42m,
42.0m,
42.00m,
42.000m,
42.0000m,
42.00000m,
42.000000m,
42.0000000m,
42.00000000m,
42.000000000m,
42.0000000000m,
42.00000000000m,
42.000000000000m,
42.0000000000000m,
42.00000000000000m,
42.000000000000000m,
42.0000000000000000m,
42.00000000000000000m,
42.000000000000000000m,
42.0000000000000000000m,
42.00000000000000000000m,
42.000000000000000000000m,
42.0000000000000000000000m,
42.00000000000000000000000m,
42.000000000000000000000000m,
42.0000000000000000000000000m,
42.00000000000000000000000000m,
42.000000000000000000000000000m,
};
foreach (var m in arr)
{
Console.WriteLine(string.Format(CultureInfo.InvariantCulture,
"{0,-32}{1,-20:R}{2:X8}", m, (double)m, m.GetHashCode()
));
}
Console.WriteLine("Funny consequences:");
var h1 = new HashSet<decimal>(arr);
Console.WriteLine(h1.Count);
var h2 = new HashSet<double>(arr.Select(m => (double)m));
Console.WriteLine(h2.Count);
}
给出了以下搞笑(显然是不正确的)输出:
gives the following "funny" (apparently incorrect) output:
42 42 40450000
42.0 42 40450000
42.00 42 40450000
42.000 42 40450000
42.0000 42 40450000
42.00000 42 40450000
42.000000 42 40450000
42.0000000 42 40450000
42.00000000 42 40450000
42.000000000 42 40450000
42.0000000000 42 40450000
42.00000000000 42 40450000
42.000000000000 42 40450000
42.0000000000000 42 40450000
42.00000000000000 42 40450000
42.000000000000000 42 40450000
42.0000000000000000 42 40450000
42.00000000000000000 42 40450000
42.000000000000000000 42 40450000
42.0000000000000000000 42 40450000
42.00000000000000000000 42 40450000
42.000000000000000000000 41.999999999999993 BFBB000F
42.0000000000000000000000 42 40450000
42.00000000000000000000000 42.000000000000007 40450000
42.000000000000000000000000 42 40450000
42.0000000000000000000000000 42 40450000
42.00000000000000000000000000 42 40450000
42.000000000000000000000000000 42 40450000
Funny consequences:
2
3
.NET下试过这个4.5.2。
Tried this under .NET 4.5.2.
推荐答案
在 Decimal.cs
,我们可以看到,的GetHashCode()
作为本土实施码。此外,我们可以看到,中投以双击
实施为 ToDouble()
,这又是一个电话作为本机代码实现。因此,从那里,我们看不到对行为的合乎逻辑的解释。
In Decimal.cs
, we can see that GetHashCode()
is implemented as native code. Furthermore, we can see that the cast to double
is implemented as a call to ToDouble()
, which in turn is implemented as native code. So from there, we can't see a logical explanation for the behaviour.
在老的共享源代码CLI ,我们可以发现这些方法有希望带来了曙光,如果他们没有改变太多旧的实现。我们可以在comdecimal.cpp找到:
In the old Shared Source CLI, we can find old implementations of these methods that hopefully sheds some light, if they haven't changed too much. We can find in comdecimal.cpp:
FCIMPL1(INT32, COMDecimal::GetHashCode, DECIMAL *d)
{
WRAPPER_CONTRACT;
STATIC_CONTRACT_SO_TOLERANT;
ENSURE_OLEAUT32_LOADED();
_ASSERTE(d != NULL);
double dbl;
VarR8FromDec(d, &dbl);
if (dbl == 0.0) {
// Ensure 0 and -0 have the same hash code
return 0;
}
return ((int *)&dbl)[0] ^ ((int *)&dbl)[1];
}
FCIMPLEND
和
FCIMPL1(double, COMDecimal::ToDouble, DECIMAL d)
{
WRAPPER_CONTRACT;
STATIC_CONTRACT_SO_TOLERANT;
ENSURE_OLEAUT32_LOADED();
double result;
VarR8FromDec(&d, &result);
return result;
}
FCIMPLEND
我们可以看到,在的GetHashCode()
实施基于转换为双击
:哈希码是基于转换后所产生的字节双击
。它是基于等于小数
值转换为等于双击
值。
We can see that the the GetHashCode()
implementation is based on the conversion to double
: the hash code is based on the bytes that result after a conversion to double
. It is based on the assumption that equal decimal
values convert to equal double
values.
因此,让我们试 VarR8FromDec
.NET之外的系统调用:
So let's test the VarR8FromDec
system call outside of .NET:
在德尔福(我实际使用FreePascal的)这里有一个简短的程序来调用系统功能直接以测试他们的行为:
In Delphi (I'm actually using FreePascal), here's a short program to call the system functions directly to test their behaviour:
{$MODE Delphi}
program Test;
uses
Windows,
SysUtils,
Variants;
type
Decimal = TVarData;
function VarDecFromStr(const strIn: WideString; lcid: LCID; dwFlags: ULONG): Decimal; safecall; external 'oleaut32.dll';
function VarDecAdd(const decLeft, decRight: Decimal): Decimal; safecall; external 'oleaut32.dll';
function VarDecSub(const decLeft, decRight: Decimal): Decimal; safecall; external 'oleaut32.dll';
function VarDecDiv(const decLeft, decRight: Decimal): Decimal; safecall; external 'oleaut32.dll';
function VarBstrFromDec(const decIn: Decimal; lcid: LCID; dwFlags: ULONG): WideString; safecall; external 'oleaut32.dll';
function VarR8FromDec(const decIn: Decimal): Double; safecall; external 'oleaut32.dll';
var
Zero, One, Ten, FortyTwo, Fraction: Decimal;
I: Integer;
begin
try
Zero := VarDecFromStr('0', 0, 0);
One := VarDecFromStr('1', 0, 0);
Ten := VarDecFromStr('10', 0, 0);
FortyTwo := VarDecFromStr('42', 0, 0);
Fraction := One;
for I := 1 to 40 do
begin
FortyTwo := VarDecSub(VarDecAdd(FortyTwo, Fraction), Fraction);
Fraction := VarDecDiv(Fraction, Ten);
Write(I: 2, ': ');
if VarR8FromDec(FortyTwo) = 42 then WriteLn('ok') else WriteLn('not ok');
end;
except on E: Exception do
WriteLn(E.Message);
end;
end.
请注意,由于Delphi和FreePascal的有任何浮点十进制类型没有语言支持,我呼叫系统的功能来执行计算。我设置 FortyTwo
首先 42
。我再加入 1
和减去 1
。我再加入 0.1
和减去 0.1
。等等。这将导致小数的精度进行扩展.NET中以同样的方式。
Note that since Delphi and FreePascal have no language support for any floating-point decimal type, I'm calling system functions to perform the calculations. I'm setting FortyTwo
first to 42
. I then add 1
and subtract 1
. I then add 0.1
and subtract 0.1
. Et cetera. This causes the precision of the decimal to be extended the same way in .NET.
和这里的(部分)的输出:
And here's (part of) the output:
...
20: ok
21: ok
22: not ok
23: ok
24: not ok
25: ok
26: ok
...
因此可见这的确是Windows中的老大难问题,仅仅发生在由.NET暴露出来。这是系统的功能被赋予平等的十进制值不同的结果,而不是他们应该是固定的,或.NET应该更改为不使用有缺陷的功能。
Thus showing that this is indeed a long-standing problem in Windows that merely happens to be exposed by .NET. It's system functions that are giving different results for equal decimal values, and either they should be fixed, or .NET should be changed to not use defective functions.
现在,在新的.NET的核心,我们可以在其 decimal.cpp 看代码来解决此问题:
Now, in the new .NET Core, we can see in its decimal.cpp code to work around the problem:
FCIMPL1(INT32, COMDecimal::GetHashCode, DECIMAL *d)
{
FCALL_CONTRACT;
ENSURE_OLEAUT32_LOADED();
_ASSERTE(d != NULL);
double dbl;
VarR8FromDec(d, &dbl);
if (dbl == 0.0) {
// Ensure 0 and -0 have the same hash code
return 0;
}
// conversion to double is lossy and produces rounding errors so we mask off the lowest 4 bits
//
// For example these two numerically equal decimals with different internal representations produce
// slightly different results when converted to double:
//
// decimal a = new decimal(new int[] { 0x76969696, 0x2fdd49fa, 0x409783ff, 0x00160000 });
// => (decimal)1999021.176470588235294117647000000000 => (double)1999021.176470588
// decimal b = new decimal(new int[] { 0x3f0f0f0f, 0x1e62edcc, 0x06758d33, 0x00150000 });
// => (decimal)1999021.176470588235294117647000000000 => (double)1999021.1764705882
//
return ((((int *)&dbl)[0]) & 0xFFFFFFF0) ^ ((int *)&dbl)[1];
}
FCIMPLEND
这出现在当前的.NET框架中实现同样,基于事实的错误双击
值之一并给予相同的散列码,但它不足以完全解决问题。
This appears to be implemented in the current .NET Framework too, based on the fact that one of the wrong double
values does give the same hash code, but it's not enough to completely fix the problem.
这篇关于数学解释为什么十进制的转换为双被打破和Decimal.GetHashCode分离等于实例的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!