在相同的代码中不一致的行为 [英] Inconsistent behaviour in identical code

查看:207
本文介绍了在相同的代码中不一致的行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

运行物理模拟大约20分钟后,错误陷阱会跳闸。意识到这一点将是一个很难调试的问题,我在一个新项目中复制了相关的子例程,并在错误发生时用原始输入数据的硬编码副本调用它。但错误陷阱没有行程!经过两天繁琐的工作以确定子程序两个实例行为的分歧点,我将这个问题描述为一个计算Cross_Product的非常简单的函数。



这个Cross_Product函数在两个程序中都是相同的。我甚至检查了反汇编,并确保编译器生成相同的代码。在这两种情况下,该功能也都接收相同的输入数据。我甚至明确地检查了函数内的舍入模式,它们是相同的。但是,他们的结果稍有不同。具体来说,三个返回矢量分量中的两个LSB是不同的。即使调试器本身也证实三个变量中的这两个不等于他们已经明确分配的表达式。 (见下面的截图。)



< img src =https://i.stack.imgur.com/Agcch.pngalt =调试截图>



在原始程序,调试器在监视列表的最后三行中显示true,而不是仅显示最后一行。



我使用Code :: Blocks 13.12 XP上的GCC Compiler,以及AMD Athlon 64 CPU。但是,我重新编译并运行了更为现代的Windows 10机器上的Code :: Blocks 16.01测试程序,并使用了Intel Core i5 CPU,结果相同。



这里是我最小的,完整的和可验证的代码,用于重现与我的原始程序和调试器本身不一致的奇怪结果(不幸的是,由于它很大,我不能包含原始物理程序):

b
$ b

  externC{
__declspec(dllimport)int __stdcall IsDebuggerPresent(void);
__declspec(dllimport)void __stdcall DebugBreak(void);
}

struct POLY_Triplet {
double XYZ [3];
};

POLY_Triplet Cross_Product(POLY_Triplet Vector1,POLY_Triplet Vector2){
POLY_Triplet结果;

Result.XYZ [0] = Vector1.XYZ [1] * Vector2.XYZ [2] - Vector1.XYZ [2] * Vector2.XYZ [1];
Result.XYZ [1] = Vector1.XYZ [2] * Vector2.XYZ [0] - Vector1.XYZ [0] * Vector2.XYZ [2];
Result.XYZ [2] = Vector1.XYZ [0] * Vector2.XYZ [1] - Vector1.XYZ [1] * Vector2.XYZ [0];

返回结果;
}

int main(){
POLY_Triplet Triplet1;

POLY_Triplet Collision_Axis_Vector;

POLY_Triplet Boundary_normal;

*(long long int *)(& Collision_Axis_Vector.XYZ [0])= 4594681439063077250;
*(long long int *)(& Collision_Axis_Vector.XYZ [1])= 4603161398996347097;
*(long long int *)(& Collision_Axis_Vector.XYZ [2])= 4605548671330989714;

*(long long int *)(& Triplet1.XYZ [0])= -4626277815076045984;
*(long long int *)(& Triplet1.XYZ [1])= -4637257536736295424;
*(long long int *)(& Triplet1.XYZ [2])= 4589609575355367200;

if(IsDebuggerPresent()){
DebugBreak();
}

Boundary_normal = Cross_Product(Collision_Axis_Vector,Triplet1);

返回0;

$ / code>

为方便起见,以下是观察列表的相关行截图:

 (Result.XYZ [0] == Vector1.XYZ [1] * Vector2.XYZ [2]  - Vector1.XYZ [2] * Vector2.XYZ [1])$ ​​b $ b(Result.XYZ [1] == Vector1.XYZ [2] * Vector2.XYZ [0]  -  Vector1.XYZ [0] * Vector2。 XYZ [2])
(Result.XYZ [2] == Vector1.XYZ [0] * Vector2.XYZ [1] - Vector1.XYZ [1] * Vector2.XYZ [0])

任何人都可以请解释这种行为吗?

解决方案

我可以确认你得到的有问题的输出可能是由x87精度的变化引起的。精度值存储在x87 FPU控制寄存器中,当更改时,该值在线程的整个生命周期中保持不变,影响线程上运行的所有x87代码。



显然,你的巨大程序的某些其他组件(或者你使用的外部库)有时会将尾数长度从53位(这是默认设置)更改为64位(这意味着使用这些80位x87寄存器的全部精度)。 >

最好的修复方法是将编译器从x87切换到SSE2目标。 SSE总是使用32位或64位浮点数(取决于所使用的指令),它根本没有80位寄存器。即使你的2003 Athlon 64已经支持这个指令集。作为一种副作用,您的代码会变得更快。



更新:如果您不想切换到SSE2,则可以重置精确到你喜欢的任何值。下面是在Visual C ++中如何做到这一点:

  #include< float.h> 
uint32_t prev;
_controlfp_s(& prev,_PC_53,_MCW_PC); //或者_PC_64对于80位

对于GCC,它就像这样(未经测试)

  #include< fpu_control.h> 
#define _FPU_PRECISION(_FPU_SINGLE | _FPU_DOUBLE | _FPU_EXTENDED)
fpu_control_t prev,curr;
_FPU_GETCW(prev);
curr =(prev&〜_FPU_PRECISION)| _FPU_DOUBLE; //或_FPU_EXTENDED为80位
_FPU_SETCW(curr);


An error trap trips after running a physics simulation for about 20 minutes. Realising this would be a pain to debug, I duplicated the relevant subroutine in a new project, and called it with hard-coded copies of the original input data at the moment the error occurred. But the error trap did not trip! After two days of tedious work to isolate the exact point where the behaviours of the two instances of the subroutine diverge, I’ve traced the problem to a very simple function for computing a Cross_Product.

This Cross_Product function is identical in both programs. I have even checked the disassembly and made sure that the compiler is producing identical code. The function is also receiving identical input data in both cases. I have even explicitly checked the rounding mode inside the functions and they are identical. Yet, they are returning slightly different results. Specifically, the LSB is different for two out of three of the returned vector components. Even the debugger itself confirms that these two out of three variables are not equal to the expressions they’ve been explicitly assigned. (See screenshot below.)

In the original program, the debugger shows "true" in all of the last three lines of the watch list instead of only the last one.

I’m using Code::Blocks 13.12 with the GCC Compiler on XP, with an AMD Athlon 64 CPU. However, I recompiled and ran the test program in Code::Blocks 16.01 on a much more modern Windows 10 machine, with an Intel Core i5 CPU, and the results were identical.

Here is my minimal, complete, and verifiable code to reproduce the bizarre result which disagrees with my original program AND with the debugger itself (unfortunately, I can’t include the original physics program because it’s HUGE):

extern "C" {
    __declspec(dllimport) int __stdcall IsDebuggerPresent(void);
    __declspec(dllimport) void __stdcall DebugBreak(void);
}

struct POLY_Triplet {
   double XYZ[3];
};

POLY_Triplet Cross_Product(POLY_Triplet Vector1, POLY_Triplet Vector2) {
   POLY_Triplet Result;

   Result.XYZ[0] = Vector1.XYZ[1] * Vector2.XYZ[2] - Vector1.XYZ[2] * Vector2.XYZ[1];
   Result.XYZ[1] = Vector1.XYZ[2] * Vector2.XYZ[0] - Vector1.XYZ[0] * Vector2.XYZ[2];
   Result.XYZ[2] = Vector1.XYZ[0] * Vector2.XYZ[1] - Vector1.XYZ[1] * Vector2.XYZ[0];

   return Result;
}

int main() {
   POLY_Triplet Triplet1;

   POLY_Triplet Collision_Axis_Vector;

   POLY_Triplet Boundary_normal;

   *(long long int *)(&Collision_Axis_Vector.XYZ[0]) = 4594681439063077250;
   *(long long int *)(&Collision_Axis_Vector.XYZ[1]) = 4603161398996347097;
   *(long long int *)(&Collision_Axis_Vector.XYZ[2]) = 4605548671330989714;

   *(long long int *)(&Triplet1.XYZ[0]) = -4626277815076045984;
   *(long long int *)(&Triplet1.XYZ[1]) = -4637257536736295424;
   *(long long int *)(&Triplet1.XYZ[2]) = 4589609575355367200;

   if (IsDebuggerPresent()) {
      DebugBreak();
   }

   Boundary_normal = Cross_Product(Collision_Axis_Vector, Triplet1);

   return 0;
}

For convenience, here are the relevant lines for the watch list, as seen in the screenshot:

(Result.XYZ[0] == Vector1.XYZ[1] * Vector2.XYZ[2] - Vector1.XYZ[2] * Vector2.XYZ[1])
(Result.XYZ[1] == Vector1.XYZ[2] * Vector2.XYZ[0] - Vector1.XYZ[0] * Vector2.XYZ[2])
(Result.XYZ[2] == Vector1.XYZ[0] * Vector2.XYZ[1] - Vector1.XYZ[1] * Vector2.XYZ[0])

Can anyone please explain this behaviour?

解决方案

I can confirm the problematic output you’re getting can be caused by a change in x87 precision. The precision value is stored in x87 FPU control register, and when changed, the value persists through the lifetime of your thread, affecting all x87 code running on the thread.

Apparently, some other component of your huge program (or an external library you use) sometimes changes mantissa length from 53 bits (which is the default) to 64 bits (which means use the full precision of these 80 bit x87 registers).

The best way to fix, switch your compiler from x87 to SSE2 target. SSE always use either 32 or 64-bit floats (depending on the instructions used), it doesn’t have 80-bit registers at all. Even your 2003 Athlon 64 already supports that instruction set. As a side effect, your code will become somewhat faster.

Update: If you don’t want to switch to SSE2, you can reset the precision to whatever value you like. Here’s how to do that in Visual C++:

#include <float.h>
uint32_t prev;
_controlfp_s( &prev, _PC_53, _MCW_PC ); // or _PC_64 for 80-bit

For GCC, it’s something like this (untested)

#include <fpu_control.h>
#define _FPU_PRECISION ( _FPU_SINGLE | _FPU_DOUBLE | _FPU_EXTENDED )
fpu_control_t prev, curr;
_FPU_GETCW( prev );
curr = ( prev & ~_FPU_PRECISION ) | _FPU_DOUBLE; // or _FPU_EXTENDED for 80 bit
_FPU_SETCW( curr );

这篇关于在相同的代码中不一致的行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆