浮点数转换恐怖,有没有出路? [英] Floating point number conversion horror, is there a way out?

查看:198
本文介绍了浮点数转换恐怖,有没有出路?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

背景



最近我的同事在测试项目中添加了一些新的测试。其中一个没有传递或持续的整合系统。由于我们有大约800个测试,需要一个小时的时间来运行所有这些测试,所以我们经常犯错误,只能在我们的开发机器上运行我们目前实现的测试。这种方法有其弱点,因为不时测试本地通过,但在集成系统上失败。当然有人可以说这不是一个错误,测试应该是独立的!



在理想的世界..肯定,但不是在我的世界。不是在一个你在初始化部分中初始化了很多单例的世界,Delphi本身引入了很多全局变量,一个在后台初始化的OTL线程池,DevExpress方法挂在你的控制画画的目的..和数十其他我不知道的事情。所以在最终结果中,一个测试可以改变其他测试的行为。 (这当然是坏的,我很高兴发生,因为希望我能够修复另一个依赖)。



我已经开始整个测试包我的机器和我已经取得了与集成系统相同的结果。到目前为止这么好,现在我已经开始关闭测试,直到我缩小了干扰最近添加的测试的测试。他们没有什么共同点。我已经深入研究了,把这个问题缩小到一条路。



问题



我们有这样的代码将文本数据转换为经度坐标(仅包括重要部分):

 程序TTerminalNVCParserTest_Unit.TranslateGPS_ValidGPSString_ReturnsValidCoords; 
const
CStrGPS ='N5145.37936E01511.8029';
var
LLatitude,LLongitude:Integer;
LLong:Double;
LStrLong,LTmpStr:String;
LFS:TFormatSettings;
begin
FillChar(LFS,SizeOf(LFS),0);
LFS.DecimalSeparator:='。';

LStrLong:=复制(CStrGPS,Pos('E',CStrGPS)+1,10);
LTmpStr:= Copy(LStrLong,1,3);
LLong:= StrToFloatDef(LTmpStr,0,LFS);
LTmpStr:= Copy(LStrLong,4,10);
LLong:= LLong + StrToFloatDef(LTmpStr,0,LFS)* 1/60;
LLongitude:= Round(LLong * 100000);

CheckEquals(1519671,LLongitude);
结束

问题是 LLongitude 有时相等到1519671,有时它会给出1519672.而不管它是否提供1519672是否来自不同测试中的其他完全不相关的代码段:

 code> FormXtrMainImport.JvWizard1.SelectNextPage; 

我已经选择了四项SelectNextPage方法,它不会触发任何可能改变FPU单元的事件作品。它不会更改 RoundingMode 的值,它始终在rmNearest上设置。



此外,Delphi不应该在这里使用银行家规则吗? :

  LLongitude:= Round(LLong * 100000); // LLong * 100000 = 1519671,5 

如果使用银行家规则,它应该给我一直1519672 1519671。



我想这里有一些损坏的内存是导致问题的,只有 SelectNextPage 揭示了但是在三台不同的机器上也出现同样的问题。



任何人都可以给我一个如何追踪这个问题的想法?或者如何确保一直稳定的转换结果?



对于误解我的问题的人


$ b $我已经检查了RoundingMode,之前我已经提到过:我已经选中了四倍的SelectNextPage方法,它不会触发任何可能改变的事件FPU单元的工作原理,它不会改变它始终在rmNearest上设置的RoundingMode的值。 在上述代码中发生任何宕机之前,RoundingMode始终为rmNearest。


  • 这不是真正的考验。


  • 添加了视频说明 / p>

    所以,在努力改进我的问题,我决定添加显示我的bizzare问题的视频。这是生产代码,我只添加了用于检查RoundingMode的断言。
    在视频的第一部分,我正在显示原始测试(@Sir Rufo,@Craig Young),负责转换的方法和正在获得的结果。在第二部分我显示,当我添加另一个无关的测试我得到不正确的结果。视频可以在确实证明了这一点:


    注意:Round的行为可能受到Set8087CW过程或System.Math.SetRoundMode函数的影响。


    所以你需要首先在程序中找到其他任何修改浮点控制字的东西。然后,您必须确保每当执行错误的代码时将其设置回所需的值。






    恭喜进一步调试。其实它实际上是乘法

      LLong * 100000 

    这受到精度控制的影响。



    看到这是这样的,看看这个程序: p>

      {$ APPTYPE CONSOLE} 
    var
    d:Double;
    e1,e2:扩展;
    begin
    d:= 15.196715;
    Set8087CW($ 1272);
    e1:= d * 100000;
    Set8087CW($ 1372);
    e2:= d * 100000;
    Writeln(e1 = e2);
    Readln;
    结束。

    输出

     
    FALSE

    因此,精度控制会影响乘法结果,至少在80位寄存器的8087单位。



    编译器不将该乘法的结果存储到变量中,并且它保留在FPU中,因此这种差异流向 Round

     
    Project1.dpr.9:Writeln(Round(LLong * 100000));
    004060E8 DD05A0AB4000 fld qword ptr [$ 0040aba0]
    004060EE D80D84614000 fmul dword ptr [$ 00406184]
    004060F4 E8BBCDFFFF call @ROUND
    004060F9 52 push edx
    004060FA 50 push eax
    004060FB A1107A4000 mov eax,[$ 00407a10]
    00406100 E827F0FFFF call @ Write0Int64
    00406105 E87ADEFFFF call @WriteLn
    0040610A E851CCFFFF call @_IOTest

    注意乘法的结果如何留在 ST(0)中,因为这正是期望其参数。



    事实上,如果将乘法运算到单独的语句中,并将其分配给变量,那么行为将变为一致再次:

      tmp:= LLong * 100000; 
    LLongitude:= Round(tmp);

    上述代码为 $ 1272 $ 1372



    仍然存在基本问题。您已经失去了对浮点控制状态的控制。为了处理这个问题,你需要控制你的FP控制状态。每当您调用可能修改的库,请在调用之前将其存储起来,然后在调用返回时还原。如果你想要有可重复,可靠和稳定的浮点代码,这种游戏是不可避免的。



    这是我的代码: / p>

     键入
    TFPControlState = record
    _8087CW:Word;
    MXCSR:UInt32;
    结束

    函数GetFPControlState:TFPControlState;
    begin
    Result._8087CW:= Get8087CW;
    Result.MXCSR:= GetMXCSR;
    结束

    程序RestoreFPControlState(const State:TFPControlState);
    begin
    Set8087CW(State._8087CW);
    SetMXCSR(State.MXCSR);
    结束

    var
    FPControlState:TFPControlState;
    ....
    FPControlState:= GetFPControlState;
    尝试
    //调用外部库改变FP控制状态
    finally
    RestoreFPControlState(FPControlState);
    结束

    请注意,此代码处理两个浮点单元,因此可以使用64位的SSE单位而不是8087单位。






    对于什么是值得的,这里是我的SSCCE:

      {$ APPTYPE CONSOLE} 
    var
    d:Double;
    begin
    d:= 15.196715;
    Set8087CW($ 1272);
    Writeln(Round(d * 100000));
    Set8087CW($ 1372);
    Writeln(Round(d * 100000));
    Readln;
    结束。

    输出

     
    1519672
    1519671


    Background

    Recently my colleague add some new tests to our test project. One of them has not passed on or continuous integration system. Since we have around 800 tests and it takes an hour to run all of it, we often make a mistake and run on our dev machines only the tests which we've currently implemented. This method has its weakness because from time to time tests are passing locally but then fail on the integration system. Of course, someone could say "it is not a mistake, tests should be independent of each other!".

    In ideal world.. sure, but not in my world. Not in a world in which you have a lot of singletons initialized in initialization section, lot of global variables introduced by the Delphi itself, an OTL thread pool initialized in background, DevExpress methods hooked to your controls for painting purposes.. and dozens of other things which I am not aware of. So in final result one test can change the behavior of other test. (Which of course is bad itself and I am glad it happen, because hopefully I will be able to fix another dependency).

    I've started the whole test package on my machine and I've achieved the same results as on integration system. So far so good, now I've started turning off tests until I've narrowed down the one test which interfered with the one recently added. They have nothing in common. I've digged deeper, and narrowed the problem to one single line. If I comment it - test passes, if not - test fails.

    Problem

    We have such code to convert text data into longitude coords(only important part was included):

    procedure TTerminalNVCParserTest_Unit.TranslateGPS_ValidGPSString_ReturnsValidCoords;
    const
      CStrGPS = 'N5145.37936E01511.8029';
    var
      LLatitude, LLongitude: Integer;
      LLong: Double;
      LStrLong, LTmpStr: String;
      LFS: TFormatSettings;
    begin
      FillChar(LFS, SizeOf(LFS), 0);
      LFS.DecimalSeparator := '.';
    
      LStrLong := Copy(CStrGPS, Pos('E', CStrGPS)+1, 10);
      LTmpStr := Copy(LStrLong,1,3);
      LLong := StrToFloatDef( LTmpStr, 0, LFS );
      LTmpStr := Copy(LStrLong,4,10);
      LLong := LLong + StrToFloatDef( LTmpStr, 0, LFS)*1/60;
      LLongitude := Round(LLong * 100000);
    
      CheckEquals(1519671, LLongitude);
    end;
    

    The problem is that LLongitude is sometimes equal to 1519671 and sometimes it gives 1519672. And whether it gives 1519672 or not is dependent from other totally unrelated piece of code in different method in different test:

    FormXtrMainImport.JvWizard1.SelectNextPage; 
    

    I've quadruple checked SelectNextPage method, it does not fire any event that could change how the FPU unit works. It does not change the value of RoundingMode it is always set up on rmNearest.

    Besides, shouldn't Delphi use here a banker rule? :

    LLongitude := Round(LLong * 100000); //LLong * 100000 = 1519671,5
    

    If banker rule is used it should give me always 1519672 not 1519671.

    I guess that there must be some corrupted memory which is causing the problem and the line with SelectNextPage only reveals it. However the same problem occurs on three different machines.

    Anyone could give me an idea how to trace this problem? Or how to assure always a stable result of conversion?

    To those who misunderstood my question

    1. I've checked the RoundingMode and I've mentioned about it earlier: "I've quadruple checked SelectNextPage method, it does not fire any event that could change how the FPU unit works. It does not change the value of RoundingMode it is always set up on rmNearest." RoundingMode is always rmNearest before any runding occurs in the above code.

    2. This is not the real test. This is only the code to show where problem occurs.

    Video description added.

    So, in striving to improve my question I've decided to add the video that shows my bizzare problem. This is the production code, I've only added assertions for checking RoundingMode. In first part of the video I am showing the original test (@Sir Rufo , @Craig Young), the method responsible for conversion and the correct result which I am getting. In the second part I am showing that when I add another unrelated test I am getting incorrect result. Video can be found here

    Reproducible example added

    It all boils down to below code:

    procedure FloatingPointNumberHorror;
    const
      CStrGPS = 'N5145.37936E01511.8029';
    var
      LLongitude: Integer;
      LFloatLon: Double;
      adcConnection: TADOConnection;
      qrySelect: TADOQuery;
      LCSVStringList: TStringList;
    begin
      //Tested on Delphi 2007, 2009, XE 5 -  Windows 7 64 bit
      adcConnection := TADOConnection.Create(nil);
      qrySelect := TADOQuery.Create(adcConnection);
      LCSVStringList := TStringList.Create;
      try
        //Prepare on the fly csv file required by ADOQuery
        LCSVStringList.Add('Col1;Col2;');
        LCSVStringList.Add('aaaa;1234;');
        LCSVStringList.SaveToFile(ExtractFilePath(ParamStr(0)) + 'test.csv');
    
        qrySelect.CursorType := ctStatic;
        qrySelect.Connection := adcConnection;
        adcConnection.ConnectionString := 'Provider=Microsoft.Jet.OLEDB.4.0;Data Source='
          + ExtractFilePath(ParamStr(0)) + ';Extended Properties="text;HDR=yes;FMT=Delimited(;)"';
    
        // Real stuff begins here, above we have only preparation of environment.
        LFloatLon := 15 + 11.8029*1/60;
        LLongitude := Round(LFloatLon * 100000);
        Assert(LLongitude = 1519671, 'Asertion 1'); //Here you will NOT receive error.
    
        //This line changes the FPU control word from $1372 to $1272.
        //This causes the change of Precision Control Field (PC) from 3 which means
        //64bit precision to 2 which means 53 bit precision thus resulting in improper rounding?
        //--> ADODB.TParameters.InternalRefresh->RefreshFromOleDB -> CommandPrepare.Prepare(0)
        qrySelect.SQL.Text := 'select * from [test.csv] WHERE 1=1';
    
        LFloatLon := 15 + 11.8029*1/60;
        LLongitude := Round(LFloatLon * 100000);
        Assert(LLongitude = 1519671, 'Asertion 2'); //Here you will receive error.
    
      finally
        adcConnection.Free;
        LCSVStringList.Free;
      end;
    end;
    

    Just copy and paste this procedure and add ADODB to uses clause. It seems that the problem is caused by some Microsoft COM object which is used by Delphi's ADO wrapper. This object is changing FPU control word, but it is not changing the rounding mode. It is changing precision control.

    Here is the FPU screenshot before and after firing up ADO-related method.:

    The only solution which comes to my mind is to use Get8087CW before using ADO code and then Set8087CW to setup control world with previously stored one.

    解决方案

    The problem is most likely because something else in your code is changing the floating point rounding mode. Have a look at this program:

    {$APPTYPE CONSOLE}
    
    {$R *.res}
    
    uses
      SysUtils, Math;
    
    const
      CStrGPS = 'N5145.37936E01511.8029';
    var
      LLatitude, LLongitude: Integer;
      LLong: Double;
      LStrLong, LTmpStr: String;
      LFS: TFormatSettings;
    
    begin
      FillChar(LFS, SizeOf(LFS), 0);
      LFS.DecimalSeparator := '.';
    
      LStrLong := Copy(CStrGPS, Pos('E', CStrGPS)+1, 10);
      LTmpStr := Copy(LStrLong,1,3);
      LLong := StrToFloatDef( LTmpStr, 0, LFS );
      LTmpStr := Copy(LStrLong,4,10);
      LLong := LLong + StrToFloatDef( LTmpStr, 0, LFS)*1/60;
    
      Writeln(FloatToStr(LLong));
      Writeln(FloatToStr(LLong*100000));
    
      SetRoundMode(rmNearest);
      LLongitude := Round(LLong * 100000);
      Writeln(LLongitude);
    
      SetRoundMode(rmDown);
      LLongitude := Round(LLong * 100000);
      Writeln(LLongitude);
    
      SetRoundMode(rmUp);
      LLongitude := Round(LLong * 100000);
      Writeln(LLongitude);
    
      SetRoundMode(rmTruncate);
      LLongitude := Round(LLong * 100000);
      Writeln(LLongitude);
    
      Readln;
    end.
    

    The output is:

    15.196715
    1519671.5
    1519671
    1519671
    1519672
    1519671
    

    Clearly your particular calculation depends on the floating point rounding mode as well as the actual input value and the code. Indeed the documentation does make this point:

    Note: The behavior of Round can be affected by the Set8087CW procedure or System.Math.SetRoundMode function.

    So you need to first of all find whatever else in your program is modifying the floating point control word. And then you must make sure that you set it back to the desired value whenever that mis-behaving code executes.


    Congratulations on debugging this further. In fact it is actually the multiplication

    LLong*100000
    

    which is influenced by the precision control.

    To see that this is so, look at this program:

    {$APPTYPE CONSOLE}
    var
      d: Double;
      e1, e2: Extended;
    begin
      d := 15.196715;
      Set8087CW($1272);
      e1 := d * 100000;
      Set8087CW($1372);
      e2 := d * 100000;
      Writeln(e1=e2);
      Readln;
    end.
    

    Output

    FALSE
    

    So, precision control influences the results of the multiplication, at least in the 80 bit registers of the 8087 unit.

    The compiler doesn't store the result of that multiplication to a variable and it remains in the FPU, so this difference flows on to the Round.

    Project1.dpr.9: Writeln(Round(LLong*100000));
    004060E8 DD05A0AB4000     fld qword ptr [$0040aba0]
    004060EE D80D84614000     fmul dword ptr [$00406184]
    004060F4 E8BBCDFFFF       call @ROUND
    004060F9 52               push edx
    004060FA 50               push eax
    004060FB A1107A4000       mov eax,[$00407a10]
    00406100 E827F0FFFF       call @Write0Int64
    00406105 E87ADEFFFF       call @WriteLn
    0040610A E851CCFFFF       call @_IOTest
    

    Notice how the result of the multiplication is left in ST(0) because that's exactly where Round expects its parameter.

    In fact, if you pull the multiplication into a separate statement, and assign it to a variable, then the behaviour becomes consistent again:

    tmp := LLong*100000;
    LLongitude := Round(tmp);
    

    The above code produces the same output for both $1272 and $1372.

    There basic issue remains though. You have lost control of the floating point control state. To deal with this you'll need to keep control of your FP control state. Whenever you call into a library that may modify it, store it away before calling, and then restore when the call returns. If you want to have anything like repeatable, reliable and robust floating point code, this sort of game is, unfortunately, inevitable.

    Here is my code to do that:

    type
      TFPControlState = record
        _8087CW: Word;
        MXCSR: UInt32;
      end;
    
    function GetFPControlState: TFPControlState;
    begin
      Result._8087CW := Get8087CW;
      Result.MXCSR := GetMXCSR;
    end;
    
    procedure RestoreFPControlState(const State: TFPControlState);
    begin
      Set8087CW(State._8087CW);
      SetMXCSR(State.MXCSR);
    end;
    
    var
      FPControlState: TFPControlState;
    ....
    FPControlState := GetFPControlState;
    try
      // call into external library that changes FP control state
    finally
      RestoreFPControlState(FPControlState);
    end;
    

    Note that this code handles both floating point units and so is ready for 64-bit which uses the SSE unit rather than the 8087 unit.


    For what it is worth, here is my SSCCE:

    {$APPTYPE CONSOLE}
    var
      d: Double;
    begin
      d := 15.196715;
      Set8087CW($1272);
      Writeln(Round(d * 100000));
      Set8087CW($1372);
      Writeln(Round(d * 100000));
      Readln;
    end.
    

    Output

    1519672
    1519671
    

    这篇关于浮点数转换恐怖,有没有出路?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆