引导期间 Linux 内核空间中的页表 [英] Page table in Linux kernel space during boot

查看：17 发布时间：2022/1/4 22:21:19 linux-kernel mmu

本文介绍了引导期间 Linux 内核空间中的页表的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我对Linux内核页表管理感到困惑?

在Linux内核空间，页表开启之前.内核将运行在具有 1-1 映射机制的虚拟内存中.页表打开后，内核会参考页表将虚拟地址转换为物理内存地址.问题是:

此时开启页表后，内核空间还是1GB(从0xC0000000 - 0xFFFFFFFF)?
并且在内核进程的页表中，只有 0xC0000000 - 0xFFFFFFFF 范围内的页表条目 (PTE) 被映射?.超出此范围的 PTE 将不会被映射，因为内核代码永远不会跳转到那里?
开启页表前后的映射地址一样吗?

例如.开启页表前，将虚拟地址0xC00000FF映射到物理地址0x000000FF，开启页表后，上述映射不变.虚拟地址 0xC00000FF 仍然映射到物理地址 0x000000FF.不同的是，在开启页表后，CPU 已经参考页表将虚拟地址转换为物理地址，而之前不需要这样做.
内核空间中的页表是全局的，会被系统中的所有进程共享，包括用户进程?
这个机制在 x86 32bit 和 ARM 中是一样的吗?

解决方案

以下讨论基于32位ARM Linux，内核源码版本为3.9
如果您通过设置初始页表(稍后将被函数 paging_init 覆盖)和打开 MMU 的过程，您的所有问题都可以解决.

当引导加载程序首次启动内核时，汇编函数stext(在archarmkernelhead.s 中)是第一个运行的函数.请注意，此时 MMU 尚未开启.

除此之外，这个函数stext完成的两个导入工作是:

创建初始页表(稍后将被覆盖函数 paging_init )
开启 MMU
跳转到内核初始化代码的 C 部分并继续

在深入研究您的问题之前，先了解一下:

在开启MMU之前，CPU发出的每一个地址都是物理的地址
开启MMU后，CPU发出的每一个地址都是虚拟地址
在打开 MMU 之前应该设置一个合适的页表，否则你的代码只会被吹走"
按照惯例，Linux 内核使用较高的 1GB 虚拟地址部分，而用户空间使用较低的 3GB 部分

现在是棘手的部分:
第一个技巧:使用与位置无关的代码.汇编函数stext链接到地址"PAGE_OFFSET + TEXT_OFFSET"(0xCxxxxxxx)，这是一个虚拟地址，但是由于MMU还没有开启，所以汇编函数stext运行的实际地址是PHYS_OFFSET + TEXT_OFFSET"(实际值取决于你的实际硬件)，这是一个物理地址.

所以，事情是这样的:函数 stext 的程序认为"它在像 0xCxxxxxxx 这样的地址中运行，但它实际上在地址(0x00000000 + some_offeset)中运行(假设你的硬件配置0x00000000 作为 RAM 的起点).所以在开启MMU之前，需要非常仔细地编写汇编代码，以确保在执行过程中不会出错.实际上使用了一种称为位置无关代码(PIC)的技术.

为了进一步解释上述内容，我提取了几个汇编代码片段:

ldr r13, =__mmap_switched @ 启用 MMU 后跳转到的地址b __enable_mmu @跳转到函数__enable_mmu"开启MMU

请注意，上面的ldr"指令是伪指令，意思是获取函数__mmap_switched的(虚拟)地址并将其放入r13中"

函数 __enable_mmu 依次调用函数 __turn_mmu_on:(请注意，我从函数 __turn_mmu_on 中删除了几条指令，这些指令是该函数的基本指令，但我们不感兴趣)

ENTRY(__turn_mmu_on)mcr p15, 0, r0, c1, c0, 0 @ write control reg to enable MMU====>这是MMU开启的地方，在这条指令之后，CPU发出的每一个地址都是虚拟地址"，由MMU翻译mov r3, r13 @ r13 存储启用 MMU 后跳转到的(虚拟)地址，即 (0xC0000000 + some_offset)mov pc, r3 @ 跳远ENDPROC(__turn_mmu_on)

第二个技巧:在打开 MMU 之前设置初始页表时的相同映射.更具体地说，运行内核代码的同一地址范围被映射了两次.

正如预期的那样，第一个映射映射地址范围 0x00000000(再次，此地址取决于硬件配置)到(0x00000000 +偏移量)到 0xCxxxxxxx 到(0xCxxxxxxx + 偏移量)
有趣的是，第二个映射映射地址范围 0x00000000通过 (0x00000000 + offset) 到自身(即:0x00000000 -->(0x00000000 + 偏移))

为什么要这样做?请记住，在MMU开启前，CPU发出的每个地址都是物理地址(从0x00000000开始)，而在MMU开启后，CPU发出的每个地址都是虚拟地址(从0xC0000000开始).
因为ARM是流水线结构，在MMU开启的那一刻，ARM的pipeine中还有指令在使用MMU开启前CPU生成的(物理)地址！为了避免这些指令被炸毁，必须建立一个相同的映射来满足它们.

现在回到你的问题:

<块引用>

此时开启页表后，内核空间还是1GB(从0xC0000000 - 0xFFFFFFFF)?

A:我猜你的意思是打开 MMU.答案是肯定的，内核空间是1GB(其实在0xC0000000以下还占了几兆字节，不过那不是我们的兴趣所在)

<块引用>

在内核进程的页表中，只有 0xC0000000 - 0xFFFFFFFF 范围内的页表条目 (PTE) 被映射?.PTE出局了这个范围的不会被映射，因为内核代码永远不会跳转到那里?

A:虽然这个问题的答案相当复杂，因为它涉及到很多关于特定内核配置的细节.
要完整回答这个问题，您需要阅读内核源代码中设置初始页表的部分(汇编函数__create_page_tables)和设置最终页表的函数(C 函数 paging_init).
简单来说，ARM中有两级页表，第一个页表是PGD，占用16KB.内核在初始化过程中首先将这个 PGD 清零，并在汇编函数 __create_page_tables 中进行初始映射.在函数 __create_page_tables 中，只映射了很小一部分地址空间.
之后，在paging_init函数中建立了最终的页表，在这个函数中，映射了相当大一部分的地址空间.假设您只有 512M RAM，对于大多数常见配置，这 512M-RAM 将按内核代码部分进行映射(1 部分为 1MB).如果您的 RAM 非常大(例如 2GB)，则只会直接映射一部分 RAM.(我会在这里打住，因为问题2的细节太多了)

<块引用>

开启页表前后的映射地址是否相同?

A:我想我在第二个技巧:打开MMU之前设置初始页表时的相同映射时已经回答了这个问题."

<块引用>

4.内核空间中的页表是全局的，将被共享系统中的所有进程，包括用户进程?

答:是和否.是的，因为所有进程共享内核页表(较高的 1GB 部分)的相同副本(内容).不是，因为每个进程使用自己的 16KB 内存来存储内核页表(尽管每个进程的更高 1GB 部分的页表内容是相同的).

<块引用>

5.这个机制在 x86 32bit 和 ARM 上是一样的吗?

不同的架构使用不同的机制

I feel confuse in page table management in Linux kernel ?

In Linux kernel space, before page table is turned on. Kernel will run in virtual memory with 1-1 mapping mechanism. After page table is turned on, then kernel has consult page tables to translate a virtual address into a physical memory address. Questions are:

At this time, after turning on page table, kernel space is still 1GB (from 0xC0000000 - 0xFFFFFFFF ) ?
And in the page tables of kernel process, only page table entries (PTE) in range from 0xC0000000 - 0xFFFFFFFF are mapped ?. PTEs are out of this range will be not mapped because kernel code never jump there ?
Mapping address before and after turning on page table is same ?

Eg. before turning on page table, the virtual address 0xC00000FF is mapped to physical address 0x000000FF, then after turning on page table, above mapping does not change. virtual address 0xC00000FF is still mapped to physical address 0x000000FF. Different thing is only that after turning on page table, CPU has consult the page table to translate virtual address to physical address which no need to do before.
The page table in kernel space is global and will be shared across all process in the system including user process ?
This mechanism is same in x86 32bit and ARM ?

解决方案

The following discussion is based on 32-bit ARM Linux, and version of kernel source code is 3.9
All your questions can be addressed if you go through the procedure of setting up the initial page table(which will be overwitten later by function paging_init ) and turning on MMU.

When kernel is first launched by bootloader, Assembly function stext(in archarmkernelhead.s) is the first function to run. Note that MMU has not been turned on yet at this moment.

Among other things, the two import jobs done by this function stext is:

create the initial page tabel(which will be overwitten later by function paging_init )
turn on MMU
jump to C part of kernel initialization code and carry on

Before delving into the your questions, it is benificial to know:

Before MMU is turned on, every address issued by CPU is physical address
After MMU is turned on, every address issued by CPU is virtual address
A proper page table should be set up before turning on MMU, otherwise your code will simply "be blown away"
By convention, Linux kernel uses higher 1GB part of virtual address and user land uses the lower 3GB part

Now the tricky part:
First trick: using position-independent code. Assembly function stext is linked to address "PAGE_OFFSET + TEXT_OFFSET"(0xCxxxxxxx), which is a virtual address, however, since MMU has not been turned on yet, the actual address where assembly function stext is running is "PHYS_OFFSET + TEXT_OFFSET"(the actual value depends on your actual hardware), which is a physical address.

So, here is the thing: the program of function stext "thinks" that it is running in address like 0xCxxxxxxx but it is actually running in address (0x00000000 + some_offeset)(say your hardware configures 0x00000000 as the starting point of RAM). So before turning on MMU, the assembly code need to be very carefully written to make sure that nothing goes wrong during the execution procedure. In fact a techinque called position-independent code(PIC) is used.

To further explain the above, I extract several assembly code snippets:

ldr r13, =__mmap_switched    @ address to jump to after MMU has been enabled

b   __enable_mmu             @ jump to function "__enable_mmu" to turn on MMU

Note that the above "ldr" instruction is a pseudo instruction which means "get the (virtual) address of function __mmap_switched and put it into r13"

And function __enable_mmu in turn calls function __turn_mmu_on: (Note that I removed several instructions from function __turn_mmu_on which are essential instructions to the function but not of our interest)

ENTRY(__turn_mmu_on)
    mcr p15, 0, r0, c1, c0, 0       @ write control reg to enable MMU====> This is where MMU is turned on, after this instruction, every address issued by CPU is "virtual address" which will be translated by MMU
    mov r3, r13   @ r13 stores the (virtual) address to jump to after MMU has been enabled, which is (0xC0000000 + some_offset)
    mov pc, r3    @ a long jump
ENDPROC(__turn_mmu_on)

Second trick: identical mapping when setting up initial page table before turning on MMU. More specifically, the same address range where kernel code is running is mapped twice.

The first mapping, as expected, maps address range 0x00000000(again, this address depends on hardware config) through (0x00000000 + offset) to 0xCxxxxxxx through (0xCxxxxxxx + offset)
The second mapping, interestingly, maps address range 0x00000000 through (0x00000000 + offset) to itself(i.e.: 0x00000000 --> (0x00000000 + offset))

Why doing that? Remember that before MMU is turned on, every address issued by CPU is physical address(starting at 0x00000000) and after MMU is turned on, every address issued by CPU is virtual address(starting at 0xC0000000).
Because ARM is a pipeline structure, at the moment MMU is turned on, there are still instructions in ARM's pipeine that are using (physical) addresses that are generated by CPU before MMU is turned on! To avoid these instructions to get blown up, an identical mapping has to be set up to cater them.

Now returning to your questions:

At this time, after turning on page table, kernel space is still 1GB (from 0xC0000000 - 0xFFFFFFFF ) ?

A: I guess you mean turning on MMU. The answer is yes, kernel space is 1GB(actually it also occupies several mega bytes below 0xC0000000, but that is not of our interest)

And in the page tables of kernel process, only page table entries (PTE) in range from 0xC0000000 - 0xFFFFFFFF are mapped ?. PTEs are out of this range will be not mapped because kernel code never jump there ?

A: While the answer to this question is quite complicated because it involves lot of details regarding specific kernel configurations.
To fully answer this question, you need to read the part of kernel source code that set up the initial page table(assembly function __create_page_tables) and the function which sets up the final page table(C function paging_init).
To put it simple, there are two levels of page table in ARM, the first page table is PGD, which occupies 16KB. Kernel first zeros out this PGD during initialization process and does the initial mapping in assembly function __create_page_tables. In function __create_page_tables, only a very small portion of address space is mapped.
After that, the final page table is set up in function paging_init, and in this function, a quite large portion of address space is mapped. Say if you only have 512M RAM, for most common configurations, this 512M-RAM would be mapping by kernel code section by section(1 section is 1MB). If your RAM is quite large(say 2GB), only a portion of your RAM will be directly mapped. (I will stop here because there are too many details regarding Question 2)

Mapping address before and after turning on page table is same ?

A: I think I've already answered this question in my explanation of "Second trick: identical mapping when setting up initial page table before turning on MMU."

4 . The page table in kernel space is global and will be shared across all process in the system including user process ?

A: Yes and no. Yes because all processes share the same copy(content) of kernel page table(higher 1GB part). No because each process uses its own 16KB memory to store the kernel page table(although the content of page table for higher 1GB part is identical for every process).

5 . This mechanism is same in x86 32bit and ARM ?

Different Architectures use different mechanism

这篇关于引导期间 Linux 内核空间中的页表的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

引导期间 Linux 内核空间中的页表 [英] Page table in Linux kernel space during boot

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

引导期间 Linux 内核空间中的页表 [英] Page table in Linux kernel space during boot

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭