为什么 64 位 Windows 不能解除用户-内核-用户异常? [英] Why can't 64-bit Windows unwind user-kernel-user exceptions?

查看:24
本文介绍了为什么 64 位 Windows 不能解除用户-内核-用户异常?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果堆栈跨越内核边界,为什么 64 位 Windows 不能在异常期间展开堆栈——而 32 位 Windows 可以?

整个问题的上下文来自:

OnLoad异常消失的案例——x64中的用户模式回调异常

背景

在 32 位 Windows 中,如果我在 用户模式 代码中抛出异常,该异常是从 内核模式 代码回调的,该代码是从我的 调用的>用户模式代码,例如:

用户模式内核模式------------------ -------------------创建窗口(...);------>NtCreateWindow(...)|WindowProc <--------------------+

Windows 中的结构化异常处理 (SEH) 可以展开堆栈,通过内核模式展开返回到我的用户代码中,在那里我可以处理异常并看到有效的堆栈跟踪.

但不适用于 64 位 Windows

64 位版本的 Windows 无法做到这一点:

<块引用>

出于复杂的原因,我们无法将异常传播回 64 位操作系统(amd64 和 IA64).自 Server 2003 的第一个 64 位版本以来一直是这种情况.在 x86 上,情况并非如此——异常通过内核边界传播,最终会返回帧

并且由于在这种情况下无法返回可靠的堆栈跟踪,因此必须做出决定:让您看到无意义的异常,还是将其完全隐藏:

<块引用>

当时的内核架构师决定采取保守的 AppCompat 友好的方法——隐藏异常,并希望最好.

文章继续讨论所有 64 位 Windows 操作系统的行为方式:

  • Windows XP 64 位
  • Windows Server 2003 64 位
  • Windows Vista 64 位
  • Windows Server 2008 64 位

但是从 Windows 7(和 Windows Server 2008)开始,架构师改变了主意——有点.对于 64 位应用程序(不是 32 位应用程序),它们会(默认情况下)停止抑制这些用户-内核-用户异常.所以,默认情况下,开启:

  • Windows 7 64 位
  • Windows Server 2008

所有 64 位应用程序都会看到这些异常,而他们以前从未看到过这些异常.

<块引用>

在 Windows 7 中,当 原生 x64 应用程序以这种方式崩溃时,通知程序兼容性助手.如果应用程序没有 Windows 7 Manifest,我们会显示一个对话框,告诉您 PCA 已应用应用程序兼容性垫片.这是什么意思?这意味着,下次运行应用程序时,Windows 将模拟 Server 2003 的行为并使异常消失.请记住,Server 2008 R2 上不存在 PCA,因此此建议不适用.

所以问题

问题是为什么 64 位 Windows 无法通过内核转换回退堆栈,而 32 位版本的 Windows 可以?

唯一的提示是:

<块引用>

出于复杂的原因,我们无法将异常传播回 64 位操作系统(amd64 和 IA64).

提示很复杂.

我可能不理解解释,因为我不是操作系统开发人员 - 但我想知道原因.

<小时>

更新:修补程序停止抑制 32 位应用

Microsoft 已发布一个修补程序使 32 位应用程序也不再具有异常被抑制:

<块引用>

KB976038:忽略在 64 位版本的 Windows 中运行的应用程序引发的异常

  • 在用户模式下运行的回调例程中抛出的异常.

在这种情况下,此异常不会导致应用程序崩溃.相反,应用程序会进入不一致的状态.然后,应用程序抛出一个不同的异常并崩溃.

用户模式回调函数通常是由内核模式组件调用的应用程序定义的函数.用户模式回调函数的例子是 Windows 过程和钩子过程.Windows 调用这些函数来处理 Windows 消息或处理 Windows 挂钩事件.

此修补程序可让您阻止 Windows 全局处理异常:

<块引用>

HKLMSOFTWAREMicrosoftWindows NTCurrentVersionImage File Execution OptionsDisableUserModeCallbackFilter: DWORD = 1

或每个应用程序:

<块引用>

HKLMSOFTWAREMicrosoftWindows NTCurrentVersionImage File Execution OptionsNotepad.exeDisableUserModeCallbackFilter: DWORD = 1

KB973460 中的 XP 和 Server 2003 也记录了该行为:

<小时>

提示

在调查使用 xperf 在 64 位 Windows 上捕获堆栈跟踪时,我发现了另一个提示:

Xperf 中的堆栈行走

<块引用>

禁用分页执行

为了在 64 位 Windows 上进行跟踪,您需要设置 DisablePagingExecutive 注册表项.这告诉操作系统不要将内核模式驱动程序和系统代码分页到磁盘,这是使用 xperf 获取 64 位调用堆栈的先决条件,因为 64 位堆栈遍历取决于可执行映像中的元数据,并且在某些情况下xperf stack walk 代码不允许触及分页页面.从提升的命令提示符运行以下命令将为您设置此注册表项.

 REG ADD "HKLMSystemCurrentControlSetControlSession ManagerMemory Management" -vDisablePagingExecutive -d 0x1 -t REG_DWORD -f

设置此注册表项后,您需要重新启动系统,然后才能记录调用堆栈.设置此标志意味着 Windows 内核会将更多页面锁定到 RAM 中,因此这可能会消耗大约 10 MB 的额外物理内存.

这给人的印象是,在 64 位 Windows 中(并且仅在 64 位 Windows 中),您不允许遍历内核堆栈,因为磁盘上可能有页面.

解决方案

我是很久以前编写此修补程序以及博客文章的开发人员.主要原因是出于性能原因,当您转换到内核空间时,并不总是捕获完整的寄存器文件.

如果您进行普通系统调用,x64 应用程序二进制接口 (ABI) 只要求您保留 非易失性寄存器(类似于进行正常的函数调用).但是,正确解除异常需要您拥有所有寄存器,因此这是不可能的.基本上,这是在关键情况下的性能(即每秒可能发生数千次的情况)与 100% 正确处理病理情况(崩溃)之间的选择.

奖励阅读

Why can't 64-bit Windows unwind the stack during an exception, if the stack crosses the kernel boundary - when 32-bit Windows can?

The context of this entire question comes from:

The case of the disappearing OnLoad exception – user-mode callback exceptions in x64

Background

In 32-bit Windows, if i throw an exception in my user mode code, that was called back from kernel mode code, that was called from my user mode code, e.g:

User mode                     Kernel Mode
------------------            -------------------
CreateWindow(...);   ------>  NtCreateWindow(...)
                                   |
WindowProc   <---------------------+                                   

the Structured Exception Handling (SEH) in Windows can unwind the stack, unwinding back through kernel mode, back into my user code, where i can handle the exception and i see a valid stack trace.

But not in 64-bit Windows

64-bit editions of Windows cannot do this:

For complicated reasons, we cannot propagate the exception back on 64-bit operating systems (amd64 and IA64). This has been the case ever since the first 64-bit release of Server 2003. On x86, this isn’t the case – the exception gets propagated through the kernel boundary and would end up walking the frames back

And since there's no way to walk back a reliable stack trace in this case, the had to make a decision: let you see the non-nonsensical exception, or hide it altogether:

The kernel architects at the time decided to take the conservative AppCompat-friendly approach – hide the exception, and hope for the best.

The article goes on to talk about how this was how all 64-bit Windows operating systems behaved:

  • Windows XP 64-bit
  • Windows Server 2003 64-bit
  • Windows Vista 64-bit
  • Windows Server 2008 64-bit

But starting with Windows 7 (and Windows Server 2008), the architects changed their minds - sort of. For only 64-bit applications (not 32-bit applications), they would (by default) stop suppressing these user-kernel-user exceptions. So, by default, on:

  • Windows 7 64-bit
  • Windows Server 2008

all 64-bit applications will see these exceptions, where they never used to see them.

In Windows 7, when a native x64 application crashes in this fashion, the Program Compatibility Assistant is notified. If the application doesn’t have a Windows 7 Manifest, we show a dialog telling you that PCA has applied an Application Compatibility shim. What does this mean? This means, that the next time you run your application, Windows will emulate the Server 2003 behavior and make the exception disappear. Keep in mind, that PCA doesn’t exist on Server 2008 R2, so this advice doesn’t apply.

So the question

The question is why is 64-bit Windows unable to unwind a stack back through a kernel transition, while 32-bit editions of Windows can?

The only hint is:

For complicated reasons, we cannot propagate the exception back on 64-bit operating systems (amd64 and IA64).

The hint is it's complicated.

i may not understand the explanation, as i'm not an operating system developer - but i'd like a shot at knowing why.


Update: Hotfix to stop suppressing 32-bit apps

Microsoft has released a hotfix enables 32-bit applications to also no longer have the exceptions suppressed:

KB976038: Exceptions that are thrown from an application that runs in a 64-bit version of Windows are ignored

  • An exception that is thrown in a callback routine runs in the user mode.

In this scenario, this exception does not cause the application to crash. Instead, the application enters into an inconsistent state. Then, the application throws a different exception and crashes.

A user mode callback function is typically an application-defined function that is called by a kernel mode component. Examples of user mode callback functions are Windows procedures and hook procedures. These functions are called by Windows to process Windows messages or to process Windows hook events.

The hotfix then lets you stop Windows from eating the exceptions globally:

HKLMSOFTWAREMicrosoftWindows NTCurrentVersionImage File Execution Options
DisableUserModeCallbackFilter: DWORD = 1

or per-application:

HKLMSOFTWAREMicrosoftWindows NTCurrentVersionImage File Execution OptionsNotepad.exe
DisableUserModeCallbackFilter: DWORD = 1

The behavior was also documented on XP and Server 2003 in KB973460:


A hint

i found another hint when investigating using xperf to capture stack traces on 64-bit Windows:

Stack Walking in Xperf

Disable Paging Executive

In order for tracing to work on 64-bit Windows you need to set the DisablePagingExecutive registry key. This tells the operating system not to page kernel mode drivers and system code to disk, which is a prerequisite for getting 64-bit call stacks using xperf, because 64-bit stack walking depends on metadata in the executable images, and in some situations the xperf stack walk code is not allowed to touch paged out pages. Running the following command from an elevated command prompt will set this registry key for you.

 REG ADD "HKLMSystemCurrentControlSetControlSession ManagerMemory Management" -v 
 DisablePagingExecutive -d 0x1 -t REG_DWORD -f

After setting this registry key you will need to reboot your system before you can record call stacks. Having this flag set means that the Windows kernel locks more pages into RAM, so this will probably consume about 10 MB of additional physical memory.

This gives the impression that in 64-bit Windows (and only in 64-bit Windows), you are not allowed to walk kernel stacks because there might be pages out on disk.

解决方案

I'm the developer who wrote this Hotfix a loooooooong time ago as well as the blog post. The main reason is that the full register file isn't always captured when you transition into kernel space, for performance reasons.

If you make a normal syscall, the x64 Application Binary Interface (ABI) only requires you to preserve the non-volatile registers (similar to making a normal function call). However, correctly unwinding the exception requires you to have all the registers, so it's not possible. Basically, this was a choice between perf in a critical scenario (i.e. a scenario that potentially happens thousands of times per second) vs. 100% correctly handling a pathological scenario (a crash).

Bonus Reading

这篇关于为什么 64 位 Windows 不能解除用户-内核-用户异常?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆