如果取消引用空指针,在CPU级别会发生什么? [英] What happens at CPU-Level if you dereference a null pointer?

查看:88
本文介绍了如果取消引用空指针,在CPU级别会发生什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有以下程序:

#include <signal.h>
#include <stddef.h>
#include <stdlib.h>

static void myHandler(int sig){
        abort();
}

int main(void){
        signal(SIGSEGV,myHandler);
        char* ptr=NULL;
        *ptr='a';
        return 0;
}

如您所见,我注册了一个信号处理程序,并进一步行了几行,我取消了对空指针==>的引用.SIGSEGV被触发.但是它是如何触发的?如果我使用 strace (已删除输出)运行它:

As you can see, I register a signalhandler and some lines further, I dereference a null pointer ==> SIGSEGV is triggered. But how is it triggered? If I run it using strace (Output stripped):

//Set signal handler (In glibc signal simply wraps a call to sigaction)
rt_sigaction(SIGSEGV, {sa_handler=0x563b125e1060, sa_mask=[SEGV], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7ffbe4fe0d30}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
//SIGSEGV is raised
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=NULL} ---
rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [SEGV], 8) = 0

但是缺少一些东西,信号如何从CPU传递到程序?我的理解:

But something is missing, how does a signal go from the CPU to the program? My understanding:

[Dereferences null pointer] -> [CPU raises an exception] -> [??? (How does it go from the CPU to the kernel?) ] -> [The kernel is notified, and sends the signal to the process] -> [??? (How does the process know, that a signal is raised?)] -> [The matching signal handler is called].

在这两个标有 ??? 的地方会发生什么?

What happens at these two places marked with ????

推荐答案

大多数(但不是全部)C实现中的 NULL 指针是地址 0 .通常,此地址不在有效(映射)页面中.

A NULL pointer in most (but not all) C implementations is address 0. Normally this address is not in a valid (mapped) page.

对硬件页面表未映射的虚拟页面的任何访问都会导致页面错误异常.例如在x86上, #PF .

Any access to a virtual page that's not mapped by the HW page tables results in a page-fault exception. e.g. on x86, #PF.

这将调用操作系统的页面错误异常处理程序以解决这种情况.例如,在x86-64上,CPU将异常返回信息压入内核堆栈,并从 IDT加载CS:RIP(中断描述符表)条目对应于该异常号.就像其他任何由用户空间触发的异常一样,例如整数除以零( #DE )或常规保护错误 #GP (试图在用户空间中运行特权指令,或者未对齐的SIMD指令需要对齐),或许多其他可能的事情.)

This invokes the OS's page-fault exception handler to resolve the situation. On x86-64 for example, the CPU pushes exception-return info on the kernel stack and loads a CS:RIP from the IDT (Interrupt Descriptor Table) entry that corresponds to that exception number. Just like any other exception triggered by user-space, e.g. integer divide by zero (#DE), or a General Protection fault #GP (trying to run a privileged instruction in user-space, or a misaligned SIMD instruction that required alignment, or many other possible things).

页面错误处理程序可以找出用户空间尝试访问的地址.例如在x86上,有一个控制寄存器(CR2),其中保存了线性(虚拟)地址,导致故障.操作系统可以使用 mov rax,cr2 将其副本复制到通用寄存器中.

The page-fault handler can find out what address user-space tried to access. e.g. on x86, there's a control register (CR2) that holds the linear (virtual) address that caused the fault. The OS can get a copy of that into a general-purpose register with mov rax, cr2.

其他ISA具有其他机制,可让OS告诉CPU它的页面错误处理程序在哪里,并让该处理程序找出用户空间试图访问的地址.但是具有虚拟内存的系统具有基本等效的机制是相当普遍的.

Other ISAs have other mechanisms for the OS to tell the CPU where its page-fault handler is, and for that handler to find out what address user-space was trying to access. But it's pretty universal for systems with virtual memory to have essentially equivalent mechanisms.

该访问尚未被确认为无效.操作系统可能不会费心连线"网络的原因有多种.进程分配的内存到硬件页表中.这就是分页的全部内容:让OS纠正这种情况,例如写时复制,延迟分配或从交换空间恢复页面.

The access is not yet known to be invalid. There are several reasons why an OS might not have bothered to "wire" a process's allocated memory into the hardware page tables. This is what paging is all about: letting the OS correct the situation, like copy-on-write, lazy allocation, or bringing a page back in from swap space.

页面错误分为三类:(摘自我的回答页面错误文章说了类似的话.

Page faults come in three categories: (copied from my answer on another question). Wikipedia's page-fault article says similar things.

  • valid (该进程在逻辑上已映射了内存,但是操作系统很懒惰或在玩诸如写时复制之类的技巧):
    • hard:页面需要从磁盘(从交换空间或磁盘文件)(例如,内存映射文件,例如可执行文件或共享库的页面)中进入页面.通常,操作系统会在等待I/O时安排另一个任务:这是硬(主要)和软(次要)之间的关键区别.
    • soft:不需要磁盘访问,例如分配+调零新的物理页面以支持用户空间刚刚尝试写入的虚拟页面.或多个进程已映射的可写页面的写时复制,但是其中一个对象的更改对另一个对象不可见(例如mmap(MAP_PRIVATE)).这会将共享页面变成私有脏页面.
    • valid (the process logically has the memory mapped, but the OS was lazy or playing tricks like copy-on-write):
      • hard: the page needs to be paged in from disk, either from swap space or from a disk file (e.g. a memory mapped file, like a page of an executable or shared library). Usually the OS will schedule another task while waiting for I/O: this is the key difference between hard (major) and soft (minor).
      • soft: No disk access required, just for example allocating + zeroing a new physical page to back a virtual page that user-space just tried to write. Or copy-on-write of a writeable page that multiple processes had mapped, but where changes by one shouldn't be visible to the other (like mmap(MAP_PRIVATE)). This turns a shared page into a private dirty page.

      因此,只有在OS查阅其自己的数据结构以查看某个进程被赋予 所拥有的虚拟地址之后,才能确保内存访问无效.

      So only after the OS consults its own data structures to see which virtual addresses a process is supposed to own can it be sure that the memory access was invalid.

      确定页面错误是否无效完全取决于软件.正如我在为什么要显示页面故障通常是由操作系统处理的,而不是由硬件处理的?-如果硬件能够解决所有问题,那么就不必陷入操作系统.

      Deciding whether a page fault is invalid or not is completely up to software. As I wrote on Why page faults are usually handled by the OS, not hardware? - if the HW could figure everything out, it wouldn't need to trap to the OS.

      有趣的事实:在 Linux 上,可以将系统配置为虚拟地址 0 是(或可以是)有效. 设置 mmap_min_addr = 0允许进程在那里进行 mmap .例如WINE需要使用它来模拟16位Windows内存布局.

      Fun fact: on Linux it's possible to configure the system so virtual address 0 is (or can be) valid. Setting mmap_min_addr = 0 allows processes to mmap there. e.g. WINE needs this for emulating a 16-bit Windows memory layout.

      因为这不会将 NULL 指针的内部对象表示形式更改为 0 以外的指针,所以这意味着NULL取消引用不再是错误.这使得调试更加困难,这就是为什么 mmap_min_addr 的默认值为64k.

      Since that wouldn't change the internal object-representation of a NULL pointer to be other than 0, doing that would mean that NULL dereference would no longer fault. That makes debugging harder, which is why the default for mmap_min_addr is 64k.

      在没有虚拟内存的较简单系统上,操作系统可能仍能够配置MMU,以捕获对地址空间某些区域的内存访问.操作系统的陷阱处理程序无需检查任何内容,它知道触发其无效的任何访问.(除非它还在某些地址空间区域中模拟某些东西……)

      On a simpler system without virtual memory, the OS might still be able to configure an MMU to trap on memory access to certain regions of address space. The OS's trap handler doesn't have to check anything, it knows any access that triggered it was invalid. (Unless it's also emulating something for some regions of address space...)

      这部分是纯软件.传递SIGSEGV与传递另一个进程发送的SIGALRM或SIGTERM没什么不同.

      This part is pure software. Delivering SIGSEGV is no different than delivering SIGALRM or SIGTERM sent by another process.

      当然,一个仅从SIGSEGV处理程序返回而又未解决问题的用户空间进程将使主线程再次重新运行相同的故障指令.(操作系统将返回引发页面错误异常的指令.)

      Of course, a user-space process that just returns from a SIGSEGV handler without fixing the problem will make the main thread re-run the same faulting instruction again. (The OS would return to the instruction that raised the page-fault exception.)

      这就是为什么SIGSEGV的默认操作是终止,以及将行为设置为忽略"的原因.

      This is why the default action for SIGSEGV is to terminate, and why it doesn't make sense to set the behaviour to "ignore".

      这篇关于如果取消引用空指针,在CPU级别会发生什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆