调试/绕过BSOD没有源代码 [英] Debugging/bypassing BSOD without source code

查看:198
本文介绍了调试/绕过BSOD没有源代码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

你好,祝你好运。



这里需要一些帮助:



情况

口有一个不起眼的DirectX 9应用程序(姓名和申请细节无关的问题)导致死亡的蓝色屏幕上的所有NVIDIA显卡(的GeForce 8400GS及以上)因为某些驱动程序版本。我相信这个问题是由DirectX 9调用或触发驱动程序错误的标志间接引起的。



目标

我想跟踪违规的标志/函数调用(为了好玩,这不是我的工作/家庭作业)和绕过错误条件通过编写代理dll。我已经有一个完成的代理dll,提供包装器的IDirect3D9,IDirect3DDevice9,IDirect3DVertexBuffer9和IDirect3DIndexBuffer9,并提供Direct3D调用的基本日志/跟踪。



>
  • 没有源代码或技术支持。

  • 内核生成的内存转储没有帮助 - 显然访问冲突发生在nv4_disp.dll内,但我不能使用堆栈跟踪去的IDirect3DDevice9方法调用,再加上有一个机会,错误异步发生。

  • (主要问题),因为大量的Direct3D9Device方法调用,我不能可靠地记录他们进入文件或通过网络:


    1. 即使没有刷新,登录文件也会导致明显的减速,并且由于系统蓝屏死机日志的所有最后内容都会丢失。

    2. 登录网络(使用UDP和WINSOck的 sendto )也会导致显着的减速,不能异步完成(异步数据包在BSOD上丢失) (崩溃周围的)有时甚至丢失,即使同步发送。

    3. 当应用程序通过日志记录程序减慢时,BSOD不太可能发生,这使得跟踪更难。

    li>

    问题

    我通常不编写驱动程序,这个级别的调试,所以我有印象,我失去了一些重要的东西还有比自定义日志机制的IDirect3DDevice9写代理DLL追查问题的更琐碎的方式。它是什么?这种诊断/处理/修复问题的标准方法是什么(没有源代码,COM接口方法触发BSOD)?



    Minidump分析/ strong>:

     
    加载用户符号
    加载卸载的模块列表
    ......... ..
    无法加载图像nv4_disp.dll,Win32错误0n2
    ***警告:无法验证nv4_disp.dll的时间戳
    ***错误:模块加载完成,但符号不能加载nv4_disp.dll
    **************************************** ***************************************
    * *
    *错误检查分析*
    * *
    *********************************** ******************************************

    使用!analyze -v获取详细的调试信息。

    BugCheck 1000008E,{c0000005,bd0a2fd0,b0562b40,0}

    可能原因:nv4_disp.dll(nv4_disp + 90fd0)

    后续: MachineOwner
    ---------

    0:kd>!analyze -v
    **************** **************************************************** *************
    *
    *检测错误分析*
    *
    ************* **************************************************** ****************

    KERNEL_MODE_EXCEPTION_NOT_HANDLED_M(1000008e)
    这是一个非常常见的错误检查。通常异常地址指向
    导致问题的驱动程序/函数。始终记住此地址
    以及包含此地址的驱动程序/映像的链接日期。
    一些常见的问题是异常代码0x80000003。这意味着一个硬的
    编码断点或断言被击中,但这个系统被引导
    / NODEBUG。这是不应该发生的开发者应该永远不会有零售代码
    硬编码的断点,但是......
    。如果发生这种情况,确保调试器获取连接和
    系统启动/调试。这将让我们看到为什么这个断点是
    发生。
    参数:
    Arg1:c0000005,未处理的异常代码
    Arg2:bd0a2fd0,异常发生在
    的地址Arg3:b0562b40,陷阱框架
    Arg4:00000000

    调试详细信息:
    ------------------


    EXCEPTION_CODE: (NTSTATUS)0xc0000005 - 0x%08lx处的指令引用了0x%08lx的内存。内存不能是%s。

    FAULTING_IP:
    nv4_disp + 90fd0
    bd0a2fd0 39b8f8000000 cmp dword ptr [eax + 0F8h],edi

    TRAP_FRAME:b0562b40 - (.trap 0xffffffffb0562b40 )
    ErrCode = 00000000
    eax = 00000808 ebx = e37f8200 ecx = e4ae1c68 edx = e37f8328 esi = e37f8400 edi = 00000000
    eip = bd0a2fd0 esp = b0562bb4 ebp = e37e09c0 iopl = 0 nv up ei pl nz na po nc
    cs = 0008 ss = 0010 ds = 0023 es = 0023 fs = 0030 gs = 0000 efl = 00010202
    nv4_disp + 0x90fd0:
    bd0a2fd0 39b8f8000000 cmp dword ptr [eax + 0F8h ],edi ds:0023:00000900 =
    重置默认范围

    CUSTOMER_CRASH_COUNT:3

    DEFAULT_BUCKET_ID:DRIVER_FAULT

    BUGCHECK_STR:0x8E

    LAST_CONTROL_TRANSFER:从bd0a2e33到bd0a2fd0

    STACK_TEXT:
    警告:堆叠展开信息不可用。以下帧可能出错。
    b0562bc4 bd0a2e33 e37f8200 e37f8200 e4ae1c68 nv4_disp + 0x90fd0
    b0562c3c bf8edd6b b0562cfc e2601714 e4ae1c58 nv4_disp + 0x90e33
    b0562c74 bd009530 b0562cfc bf8ede06 e2601714 WIN32K!WatchdogDdDestroySurface + 0x38
    b0562d30 bd00b3a4 e2601008 e4ae1c58 b0562d50 DXG!vDdDisableSurfaceObject + 0x294
    b0562d54 8054161c e2601008 00000001 0012c518 DXG!DxDdDestroySurface +的0x42
    b0562d54 7c90e4f4 e2601008 00000001 0012c518!NT KiFastCallEntry + 0xFC有
    0012c518 00000000 00000000 00000000 00000000 0x7c90e4f4


    STACK_COMMAND:kb

    FOLLOWUP_IP:
    nv4_disp + 90fd0
    bd0a2fd0 39b8f8000000 cmp dword ptr [eax + 0F8h],edi

    SYMBOL_STACK_INDEX:0

    SYMBOL_NAME:nv4_disp + 90fd0

    FOLLOWUP_NAME:MachineOwner

    MODULE_NAME:nv4_disp

    IMAGE_NAME:nv4_disp.dll

    DEBUG_FLR_IMAGE_TIMESTAMP:4e390d56

    FAILURE_BUCKET_ID:0x8E_nv4_disp + 90fd0

    BUCKET_ID:0x8E_nv4_disp + 90fd0

    后续:MachineOwner


    解决方案

    找到解决方案。



    :结果
    日志是因为(当倾倒到文件)BSOD过程中消失的消息不可靠,在登录网络时,数据包有时会丢失,而且也放缓,由于记录

    $ b $。 b

    解决方案

    不要记录到文件或网络上,而是配置系统在BSOD上生成完整的物理内存转储并将所有消息记录到任何内存缓冲。它会更快。一旦系统崩溃,它将把整个内存转储到文件中,并且可以使用WinDBG的 dt 查看日志文件缓冲区的内容(如果你有调试符号)命令,或者您将能够使用内存视图搜索和查找存储在内存中的日志文件。



    我使用std :: strings的循环缓冲来存储消息和单独的const char *数组,使得更容易在WinDBG中读取,但是你可以简单地创建巨大的数组



    在winxp上的整个过程:


    1. 确保最小页面文件大小等于或大于RAM总容量+ 1兆字节。 (右键单击我的电脑 - >属性 - >高级 - >性能 - >高级 - >更改)

    2. 配置系统以在BSOD上生成完整的内存转储 - >属性 - >高级 - >启动和恢复 - >设置 - >写入调试信息。选择完全内存转储并指定所需的路径。

    3. 确保磁盘

    4. 使用调试符号和触发蓝屏来构建应用程序/ dll(记录日志的文件)的可用空间(您系统上的RAM总量)。

    5. 等待内存转储完成后,重新启动。在系统写入内存转储并重新启动时,可随意驱使驱动程序开发人员。

    6. 复制MEMORY.DMP系统生成到安全的地方,所以如果系统再次崩溃,你不会失去一切。

    7. 启动windbg。

    8. 打开内存转储> Open Crash Dump)。

    9. 如果您想查看发生了什么,请使用!analyze -v 命令。

    10. 使用以下方法之一访问存储记录消息的内存缓冲区:


      1. 要查看全局变量的内容,使用 dt模块! variable 其中module是您的库的名称(不带* .dll),variable是变量的名称。您可以使用通配符。您可以使用不带 module!variable的地址

      2. 要查看全局变量的一个字段的内容(如果全局变量是一个struct) ,使用 dt模块!变量字段其中字段是可变成员。

      3. 要查看更多关于varaible子结构)使用 dt -b模块!变量字段 dt -b模块!变量

      4. 如果没有符号,您需要使用记忆窗口搜索您的日志文件。


    此时,您将可以查看存储在内存中的日志内容,此外,您还可以在崩溃时刻获取整个系统的快照。



    此外...


    1. 要查看关于系统崩溃的进程的信息,请使用!process

    2. 要查看加载的模块,请使用 lm
    3. 有关线程的信息有!thread id 其中id是您在!中看到的十六进制id 输出。


    Hello and good day to you.

    Need a bit of assitance here:

    Situation:
    I have an obscure DirectX 9 application (name and application details are irrelevant to the question) that causes blue screen of death on all nvidia cards (GeForce 8400GS and up) since certain driver version. I believe that the problem is indirectly caused by DirectX 9 call or a flag that triggers driver bug.

    Goal:
    I'd like to track down offending flag/function call (for fun, this isn't my job/homework) and bypass error condition by writing proxy dll. I already have a finished proxy dll that provides wrappers for IDirect3D9, IDirect3DDevice9, IDirect3DVertexBuffer9 and IDirect3DIndexBuffer9 and provides basic logging/tracing of Direct3D calls. However, I can't pinpoint function which causes crash.

    Problems:

    1. No source code or technical support is available. There will be no assitance, and nobody else will fix the problem.
    2. Memory dump produced by kernel wasn't helpful - apparently an access violation happens within nv4_disp.dll, but I can't use stacktrace to go to IDirect3DDevice9 method call, plus there's a chance that bug happens asynchronously.
    3. (Main problem) Because of large number of Direct3D9Device method calls, I can't reliably log them into file or over network:

      1. Logging into file causes significant slowdown even without flushing, and because of that all last contents of the log are lost when system BSODs.
      2. Logging over network (using UDP and WINSOck's sendto)also causes significant slowdown and must not be done asynchronously (asynchronous packets are lost on BSOD), plus packets (the ones around the crash) are sometimes lost even when sent synchronously.
      3. When application is "slowed" down by logging routines, BSOD is less likely to happen, which makes tracking it down harder.

    Question:
    I normally don't write drivers, and don't do this level of debugging, so I have impression that I'm missing something important there's a more trivial way to track down the problem than writing IDirect3DDevice9 proxy dll with custom logging mechanism. What is it? What is the standard way of diagnosing/handling/fixing problem like this (no source code, COM interface method triggers BSOD)?

    Minidump analysis(WinDBG):

    Loading User Symbols
    Loading unloaded module list
    ...........
    Unable to load image nv4_disp.dll, Win32 error 0n2
    *** WARNING: Unable to verify timestamp for nv4_disp.dll
    *** ERROR: Module load completed but symbols could not be loaded for nv4_disp.dll
    *******************************************************************************
    *                                                                             *
    *                        Bugcheck Analysis                                    *
    *                                                                             *
    *******************************************************************************
    
    Use !analyze -v to get detailed debugging information.
    
    BugCheck 1000008E, {c0000005, bd0a2fd0, b0562b40, 0}
    
    Probably caused by : nv4_disp.dll ( nv4_disp+90fd0 )
    
    Followup: MachineOwner
    ---------
    
    0: kd> !analyze -v
    *******************************************************************************
    *                                                                             *
    *                        Bugcheck Analysis                                    *
    *                                                                             *
    *******************************************************************************
    
    KERNEL_MODE_EXCEPTION_NOT_HANDLED_M (1000008e)
    This is a very common bugcheck.  Usually the exception address pinpoints
    the driver/function that caused the problem.  Always note this address
    as well as the link date of the driver/image that contains this address.
    Some common problems are exception code 0x80000003.  This means a hard
    coded breakpoint or assertion was hit, but this system was booted
    /NODEBUG.  This is not supposed to happen as developers should never have
    hardcoded breakpoints in retail code, but ...
    If this happens, make sure a debugger gets connected, and the
    system is booted /DEBUG.  This will let us see why this breakpoint is
    happening.
    Arguments:
    Arg1: c0000005, The exception code that was not handled
    Arg2: bd0a2fd0, The address that the exception occurred at
    Arg3: b0562b40, Trap Frame
    Arg4: 00000000
    
    Debugging Details:
    ------------------
    
    
    EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at "0x%08lx" referenced memory at "0x%08lx". The memory could not be "%s".
    
    FAULTING_IP: 
    nv4_disp+90fd0
    bd0a2fd0 39b8f8000000    cmp     dword ptr [eax+0F8h],edi
    
    TRAP_FRAME:  b0562b40 -- (.trap 0xffffffffb0562b40)
    ErrCode = 00000000
    eax=00000808 ebx=e37f8200 ecx=e4ae1c68 edx=e37f8328 esi=e37f8400 edi=00000000
    eip=bd0a2fd0 esp=b0562bb4 ebp=e37e09c0 iopl=0         nv up ei pl nz na po nc
    cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00010202
    nv4_disp+0x90fd0:
    bd0a2fd0 39b8f8000000    cmp     dword ptr [eax+0F8h],edi ds:0023:00000900=????????
    Resetting default scope
    
    CUSTOMER_CRASH_COUNT:  3
    
    DEFAULT_BUCKET_ID:  DRIVER_FAULT
    
    BUGCHECK_STR:  0x8E
    
    LAST_CONTROL_TRANSFER:  from bd0a2e33 to bd0a2fd0
    
    STACK_TEXT:  
    WARNING: Stack unwind information not available. Following frames may be wrong.
    b0562bc4 bd0a2e33 e37f8200 e37f8200 e4ae1c68 nv4_disp+0x90fd0
    b0562c3c bf8edd6b b0562cfc e2601714 e4ae1c58 nv4_disp+0x90e33
    b0562c74 bd009530 b0562cfc bf8ede06 e2601714 win32k!WatchdogDdDestroySurface+0x38
    b0562d30 bd00b3a4 e2601008 e4ae1c58 b0562d50 dxg!vDdDisableSurfaceObject+0x294
    b0562d54 8054161c e2601008 00000001 0012c518 dxg!DxDdDestroySurface+0x42
    b0562d54 7c90e4f4 e2601008 00000001 0012c518 nt!KiFastCallEntry+0xfc
    0012c518 00000000 00000000 00000000 00000000 0x7c90e4f4
    
    
    STACK_COMMAND:  kb
    
    FOLLOWUP_IP: 
    nv4_disp+90fd0
    bd0a2fd0 39b8f8000000    cmp     dword ptr [eax+0F8h],edi
    
    SYMBOL_STACK_INDEX:  0
    
    SYMBOL_NAME:  nv4_disp+90fd0
    
    FOLLOWUP_NAME:  MachineOwner
    
    MODULE_NAME: nv4_disp
    
    IMAGE_NAME:  nv4_disp.dll
    
    DEBUG_FLR_IMAGE_TIMESTAMP:  4e390d56
    
    FAILURE_BUCKET_ID:  0x8E_nv4_disp+90fd0
    
    BUCKET_ID:  0x8E_nv4_disp+90fd0
    
    Followup: MachineOwner
    

    解决方案

    Found a solution.

    Problem:
    Logging is unreliable since messages (when dumped to file) disappear during bsod, packets are sometimes lost when logging over network, and there's slowdown due to logging.

    Solution:
    Instead of logging to file or over network, configure system to produce full physical memory dump on BSOD and log all messages into any memory buffer. It'll be faster. Once system crashed, it'll dump entire memory into file, and it'll be possible to either view contents of log-file buffer using WinDBG's dt (if you have debug symbols) command, or you'll be able to search and locate logfile stored in memory using "memory" view.

    I used circular buffer of std::strings to store messages and separate array of const char* to make things easier to read in WinDBG, but you could simply create huge array of char and store all messages within it in plaintext.

    Details:
    Entire process on winxp:

    1. Ensure that minimum page file size is equal or larger than total amount of RAM + 1 megabytes. (Right Click "My Computer"->Properties->Advanced->Performance->Advanced->Change)
    2. Configure system to produce complete memory dump on BSOD (RIght click "My Computer'->Properties->Advanced->Startup and Recovery->Settings->Write Debugging Information . Select "Complete memory dump" and specify path you want).
    3. Ensure that disk (where the file will be written) has required amount of free space (total amount of RAM on your system.
    4. Build app/dll (the one that does logging) with debug symbol, and Trigger BSOD.
    5. Wait till memory dump is finished, reboot. Feel free to swear at driver developer while system writes memory dump and reboots.
    6. Copy MEMORY.DMP system produced to a safe place, so you won't lose everything if system crashes again.
    7. Launch windbg.
    8. Open Memory Dump (File->Open Crash Dump).
    9. If you want to see what happened, use !analyze -v command.
    10. Access memory buffer that stores logged messages using one of those methods:

      1. To see contents of global variable, use dt module!variable where "module" is name of your library (without *.dll), and "variable" is name of variable. You can use wildcards. You can use address without module!variable
      2. To see contents of one field of the global variable (if global variable is a struct), use dt module!variable field where "field" is variable member.
      3. To see more details about varaible (content of arrays and substructures) use dt -b module!variable field or dt -b module!variable
      4. If you don't have symbols, you'll need to search for your "logfile" using memory window.

    At this point you'll be able to see contents of log that were stored in memory, plus you'll have snapshot of the entire system at the moment when it crashed.

    Also...

    1. To see info about process that crashed the system, use !process.
    2. To see loaded modules use lm
    3. For info about thread there's !thread id where id is hexadecimal id you saw in !process output.

    这篇关于调试/绕过BSOD没有源代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆