Windows:避免在堆栈上推送完整的x86上下文 [英] Windows: avoid pushing full x86 context on stack

查看:185
本文介绍了Windows:避免在堆栈上推送完整的x86上下文的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经实施了 PARLANSE ,这是使用仙人掌的MS Windows下的语言堆栈来实现并行程序。堆栈块分配在每个函数
的基础上,并且只有正确的大小来处理局部变量,
表达式temp推/弹出,并调用库(包括
堆栈空间用于库例程工作)。这样的堆栈
框架在实践中可以小到32个字节,通常是。

I have implemented PARLANSE, a language under MS Windows that uses cactus stacks to implement parallel programs. The stack chunks are allocated on a per-function basis and are just the right size to handle local variables, expression temp pushes/pops, and calls to libraries (including stack space for the library routines to work in). Such stack frames can be as small as 32 bytes in practice and often are.

除非代码执行某些愚蠢的事情,否则
导致硬件陷阱...此时
出现在Windows 之上,坚持将整个x86机器上下文推送到堆栈上。
如果您包含FP / MMX /等等,这是大约500+字节。注册,
它做。当然,一个500字节的一个32字节堆栈
可以捣毁它不应该的东西。 (硬件在陷阱上推了几个字
,但不是整个上下文)。

This all works great unless the code does something stupid and causes a hardware trap... at which point Windows appears to insist on pushing the entire x86 machine context "on the stack". This is some 500+ bytes if you include the FP/MMX/etc. registers, which it does. Naturally, a 500 byte push on a 32 byte stack smashes things it should not. (The hardware pushes a few words on a trap, but not the entire context).

我可以让Windows将异常上下文块
存储在某个地方(例如,到一个特定于线程的位置)?
然后,软件可以将线程上的
命中的异常处理,而不会溢出我的
小堆栈框架。

Can I get Windows to store the exception context block someplace else (e.g., to a location specific to a thread)? Then the software could take the exception hit on the thread and process it without overflowing my small stack frames.

不要以为这是可能的,但我以为我会要一个更大的
的观众。是否有操作系统标准调用/接口
可以导致这种情况发生?

I don't think this is possible, but I thought I'd ask a much larger audience. Is there an OS standard call/interface that can cause this to happen?

在操作系统中执行这将是微不足道的,如果我可以将MS让我的
进程可选地定义一个上下文存储位置contextp,其中
被初始化以默认启用当前的旧版行为。
然后替换中断/陷阱向量codee:

It would be trivial to do in the OS, if I could con MS into letting my process optionally define a context storage location, "contextp", which is initialized to enable the current legacy behavior by default. Then replacing the interrrupt/trap vector codee:

  hardwareint:   push  context
                mov   contextp, esp

... with ...

... with ...

  hardwareint:  mov <somereg> contextp
                test <somereg>
                jnz  $2
                push  context
                mov   contextp, esp
                jmp $1 
         $2:    store context @ somereg
         $1:    equ   *

具有保存somereg等所需的明显更改。

with the obvious changes required to save somereg, etc.

[我现在做的是:检查每个功能的生成代码。
如果它有机会产生陷阱(例如,除以零),
或我们正在调试(可能的坏指针deref等),添加
足够的空间到堆栈帧为FP上下文。堆栈框架
现在最终是~~ 500-1000字节的大小,程序不能
递归到目前为止,这有时是我们正在写的
应用程序的真正问题。所以我们有一个可行的解决方案,
但它使调试变得复杂]

[What I do now is: check the generated code for each function. If it has a chance of generating a trap (e.g., divide by zero), or we are debugging (possible bad pointer deref, etc.), add enough space to the stack frame for the FP context. Stack frames now end up being ~~ 500-1000 bytes in size, programs can't recurse as far, which is sometimes a real problem for the applicaitons we are writing. So we have a workable solution, but it complicates debugging]

编辑8月25日:我设法将这个故事提交给Microsoft内部工程师
谁具有权威性,以查明谁在MS可能实际上
护理。可能有一个解决方案的微弱希望。

EDIT Aug 25: I've managed to get this story to a Microsoft internal engineer who has the authority apparantly to find out who in MS might actually care. There might be faint hope for a solution.

编辑9月14日:MS Kernal集团架构师听说过这个故事,并且是同情的。他说MS会考虑一个解决方案(像一个提出的),但不太可能在一个服务包。可能要等待下一个版本的Windows。 (叹息...我可能会变老...)

EDIT Sept 14: MS Kernal Group Architect has heard the story and is sympathetic. He said MS will consider a solution (like the one proposed) but unlikely to be in a service pack. Might have to wait for next version of Windows. (Sigh...I might grow old...)

编辑:2010年9月13日(1年后)。没有微软的行动。我最新的噩梦:在Windows X64上执行32位进程的陷阱,在中断处理程序伪造32位上下文之前,将整个X64上下文推送到堆栈中?那会更大(两倍于整数寄存器的两倍,是SSE寄存器的两倍)??

Sept 13, 2010 (1 year later). No action on Microsoft's part. My latest nightmare: does taking a trap running a 32 bit process on Windows X64, push the entire X64 context on the stack before the interrupt handler fakes pushing a 32 bit context? That'd be even larger (twice as many integer registers twice as wide, twice as many SSE registers(?))?

编辑:2012年2月25日:(1.5几年过去了...)对微软没有反应。我猜他们只是不关心我的并行性。我认为这是对社会的一种伤害; MS在正常情况下使用的大堆栈模型通过吃大量虚拟机限制了任何时刻可以存在的并行计算量。 PARLANSE模型将让人们在各种运行/等待状态下拥有一百万个活的谷物应用程序;这真的发生在我们的一些应用程序中,并行处理了1亿个节点图。 PARLANSE方案可以实现大约1Gb的RAM,这是非常易于管理的。如果您尝试使用MS 1Mb大堆栈,则需要10 ^ 12个字节的虚拟机只适用于堆栈空间,我非常确定Windows不会让您管理一百万个线程。

February 25, 2012: (1.5 years have gone by...) No reaction on Microsoft's part. I guess they just don't care about my kind of parallelism. I think this is a disservice to the community; the "big stack model" used by MS under normal circumstance limits the amount of parallel computations one can have alive at any one instant by eating vast amounts of VM. The PARLANSE model will let one have an application with a million live "grains" in various states of running/waiting; this really occurs in some of our applications where a 100 million node graph is processed "in parallel". The PARLANSE scheme can do this with about 1Gb of RAM, which is pretty manageable. If you tried that with MS 1Mb "big stacks" you'd need 10^12 bytes of VM just for the stack space and I'm pretty sure Windows won't let you manage a million threads.

编辑:2014年4月29日(4年过去了)。 我已经在PARLANSE上完成了很多工程,所以我们只需要在调试过程中花费大量的堆栈帧,或者在进行FP操作时,设法找到非常实用的方式来生活。 MS继续令人失望;各种版本的Windows推送在堆栈上的东西的数量似乎差异很大,非常高于和超出了对硬件上下文的需求。有一些提示,一些这种变异性是由非MS产品(例如防病毒)在异常处理链中粘住他们的鼻子引起的;为什么他们不能从我的地址空间外面呢?任何,我们通过简单地为FP /调试陷阱添加一个大的斜率因子来处理所有这些,并等待超出该数量的字段中不可避免的MS系统。

April 29, 2014: (4 years have gone by). I guess MS just doesn't read SO. I've done enough engineering on PARLANSE so we only pay the price of large stack frames during debugging or when there are FP operations going on, so we've managed to find very practical ways to live with this. MS has continued to disappoint; the amount of stuff pushed on the stack by various versions of Windows seems to vary considerably and egregiously above and beyond the need for just the hardware context. There's some hint that some of this variability is caused by non-MS products sticking (e.g. antivirus) sticking their nose in the exception handling chain; why can't they do that from outside my address space? Any, we handle all this by simply adding a large slop factor for FP/debug traps, and waiting for the inevitable MS system in the field that exceeds that amount.

推荐答案

基本上,您需要重新实现许多中断处理程序,即将自己挂接到中断描述符表(IDT)中。
问题是,您还需要重新实现一个kernelmode - > usermode回调(对于SEH,这个回调位于 ntdll.dll 中,并被命名为 KiuserExceptionDispatcher ,这将触发所有SEH逻辑)。关键是,系统的其余部分依赖于SEH以现在的方式工作,并且您的解决方案会因为系统性而破坏事情。也许你可以检查中断时你在哪个进程。
但是,整体概念容易出错,非常严重影响系统的稳定性imho。

这些实际上是类似rootkit的技术。

Basically you would need to re-implement many interrupt handlers, i.e. hook yourself into the Interrupt Descriptor Table (IDT). The problem is, that you would also need to re-implement a kernelmode -> usermode callback (for SEH this callback resides in ntdll.dll and is named KiuserExceptionDispatcher, this triggers all the SEH logic). The point is, that the rest of the system relies upon SEH working the way it does right now, and your solution would break things because you were doing it system wide. Maybe you could check in which process you are at the time of the interrupt. However, the overall concept is prone to errors and very badly affects system stability imho.
These are actually rootkit-like techniques.

编辑:

一些更多的细节:你需要重新实现中断处理程序的原因是,例外(例如除以零)本质上是软件中断,总是通过IDT。当抛出异常时,内核收集上下文并将异常发送回usermode(通过上述的ntdll中的KiUserExceptionDispatcher)。您需要在这一点上进行干预,因此您还需要提供一种恢复用户模式的机制。 (在ntdll中有一个函数用作内核模式的入口点 - 我不记得名字,但是它的内容与KiUserACP有关.....)


Some more details: the reason why you would need to re-implement interrupt handlers is, that exceptions (e.g. divide by zero) are essentially software interrupts and those always go through the IDT. When the exception has been thrown, the kernel collects the context and signals the exception back to usermode (through the aforementioned KiUserExceptionDispatcher in ntdll). You'd need to interfere at this point and therefore you would also need to provide a mechanism to get back to user mode. (There is a function in ntdll which is used as the entry point from kernel mode - I don't remember the name but its something with KiUserACP.....)

这篇关于Windows:避免在堆栈上推送完整的x86上下文的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆