引导期间Linux内核空间中的页表 [英] Page table in Linux kernel space during boot

查看:107
本文介绍了引导期间Linux内核空间中的页表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Linux内核的页面表管理中感到困惑吗?

I feel confuse in page table management in Linux kernel ?

在Linux内核空间中,打开页表之前.内核将以1-1映射机制在虚拟内存中运行.打开页面表后,内核将查询页面表以将虚拟地址转换为物理内存地址. 问题是:

In Linux kernel space, before page table is turned on. Kernel will run in virtual memory with 1-1 mapping mechanism. After page table is turned on, then kernel has consult page tables to translate a virtual address into a physical memory address. Questions are:

  1. 这时,打开页表后,内核空间仍然是1GB(从0xC0000000-0xFFFFFFFF)?

  1. At this time, after turning on page table, kernel space is still 1GB (from 0xC0000000 - 0xFFFFFFFF ) ?

并且在内核进程的页表中,仅映射范围为0xC0000000-0xFFFFFFFF的页表项(PTE).超出此范围的PTE将不会被映射,因为内核代码永远不会跳到那里?

And in the page tables of kernel process, only page table entries (PTE) in range from 0xC0000000 - 0xFFFFFFFF are mapped ?. PTEs are out of this range will be not mapped because kernel code never jump there ?

打开页表前后的映射地址相同吗?

例如.在打开页表之前,将虚拟地址0xC00000FF映射到物理地址0x000000FF,然后在打开页表之后,上面的映射不会更改.虚拟地址0xC00000FF仍映射到物理地址0x000000FF.唯一不同的是,打开页表后,CPU会参考页表将虚拟地址转换为物理地址,而这之前不需要做.

Mapping address before and after turning on page table is same ?

Eg. before turning on page table, the virtual address 0xC00000FF is mapped to physical address 0x000000FF, then after turning on page table, above mapping does not change. virtual address 0xC00000FF is still mapped to physical address 0x000000FF. Different thing is only that after turning on page table, CPU has consult the page table to translate virtual address to physical address which no need to do before.

内核空间中的页表是全局的,将在系统中的所有进程(包括用户进程)之间共享?

The page table in kernel space is global and will be shared across all process in the system including user process ?

这种机制在x86 32位和ARM中是相同的吗?

This mechanism is same in x86 32bit and ARM ?

推荐答案

以下讨论基于32位ARM Linux,内核源代码的版本为3.9
如果您完成设置初始页表(稍后将由功能paging_init忽略)并打开MMU的过程,则可以解决所有问题.

The following discussion is based on 32-bit ARM Linux, and version of kernel source code is 3.9
All your questions can be addressed if you go through the procedure of setting up the initial page table(which will be overwitten later by function paging_init ) and turning on MMU.

当引导程序首次启动内核时,汇编函数stext(在arch \ arm \ kernel \ head.s中)是第一个运行的函数.请注意,此刻MMU尚未打开.

When kernel is first launched by bootloader, Assembly function stext(in arch\arm\kernel\head.s) is the first function to run. Note that MMU has not been turned on yet at this moment.

除其他功能外,此功能stext完成的两个导入作业是:

Among other things, the two import jobs done by this function stext is:

  • 创建初始页面表格(稍后将被忽略) 函数paging_init)
  • 打开MMU
  • 跳到内核初始化代码的C部分并继续
  • create the initial page tabel(which will be overwitten later by function paging_init )
  • turn on MMU
  • jump to C part of kernel initialization code and carry on

在研究您的问题之前,先了解一下是有益的:

Before delving into the your questions, it is benificial to know:

  • 在打开MMU之前,CPU发出的每个地址都是物理的 地址
  • 打开MMU后,CPU发出的每个地址都是虚拟地址
  • 在打开MMU之前应建立适当的页表,否则您的代码将被吹走".
  • 按照惯例,Linux内核使用虚拟地址的1GB较高部分,而用户土地使用3GB的较低部分
  • Before MMU is turned on, every address issued by CPU is physical address
  • After MMU is turned on, every address issued by CPU is virtual address
  • A proper page table should be set up before turning on MMU, otherwise your code will simply "be blown away"
  • By convention, Linux kernel uses higher 1GB part of virtual address and user land uses the lower 3GB part

现在棘手的部分:
第一招:使用与位置无关的代码. 汇编函数stext链接到地址"PAGE_OFFSET + TEXT_OFFSET"(0xCxxxxxxx),这是一个虚拟地址,但是,由于尚未打开MMU,因此运行汇编函数stext的实际地址为"PHYS_OFFSET + TEXT_OFFSET"(实际值取决于您的实际硬件),它是一个物理地址.

Now the tricky part:
First trick: using position-independent code. Assembly function stext is linked to address "PAGE_OFFSET + TEXT_OFFSET"(0xCxxxxxxx), which is a virtual address, however, since MMU has not been turned on yet, the actual address where assembly function stext is running is "PHYS_OFFSET + TEXT_OFFSET"(the actual value depends on your actual hardware), which is a physical address.

所以,这就是问题:函数stext的程序认为"它在地址0xCxxxxxxx中运行,但实际上在地址(0x00000000 + some_offeset)中运行(例如,您的硬件将0x00000000配置为起点的内存).因此,在打开MMU之前,需要非常仔细地编写汇编代码,以确保在执行过程中没有出错.实际上,使用了一种称为位置无关代码(PIC)的技术.

So, here is the thing: the program of function stext "thinks" that it is running in address like 0xCxxxxxxx but it is actually running in address (0x00000000 + some_offeset)(say your hardware configures 0x00000000 as the starting point of RAM). So before turning on MMU, the assembly code need to be very carefully written to make sure that nothing goes wrong during the execution procedure. In fact a techinque called position-independent code(PIC) is used.

为进一步解释上述内容,我提取了一些汇编代码片段:

To further explain the above, I extract several assembly code snippets:

ldr r13, =__mmap_switched    @ address to jump to after MMU has been enabled

b   __enable_mmu             @ jump to function "__enable_mmu" to turn on MMU

请注意,上述"ldr"指令是伪指令,其意思是获取函数__mmap_switched的(虚拟)地址并将其放入r13"

Note that the above "ldr" instruction is a pseudo instruction which means "get the (virtual) address of function __mmap_switched and put it into r13"

函数__enable_mmu依次调用函数__turn_mmu_on: (请注意,我从__turn_mmu_on函数中删除了几条指令,这些指令是该函数必不可少的指令,但与我们无关)

And function __enable_mmu in turn calls function __turn_mmu_on: (Note that I removed several instructions from function __turn_mmu_on which are essential instructions to the function but not of our interest)

ENTRY(__turn_mmu_on)
    mcr p15, 0, r0, c1, c0, 0       @ write control reg to enable MMU====> This is where MMU is turned on, after this instruction, every address issued by CPU is "virtual address" which will be translated by MMU
    mov r3, r13   @ r13 stores the (virtual) address to jump to after MMU has been enabled, which is (0xC0000000 + some_offset)
    mov pc, r3    @ a long jump
ENDPROC(__turn_mmu_on)

第二招:在打开MMU之前设置初始页表时的映射相同. 更具体地说,将运行内核代码的相同地址范围映射两次.

Second trick: identical mapping when setting up initial page table before turning on MMU. More specifically, the same address range where kernel code is running is mapped twice.

  • 按预期,第一个映射将映射地址范围0x00000000(再次, 此地址取决于硬件配置)通过(0x00000000 + 偏移量)到(0xCxxxxxxx +偏移量)到0xCxxxxxxx
  • 有趣的是,第二个映射映射地址范围0x00000000 通过(0x00000000 +偏移量)到自身(即:0x00000000-> (0x00000000 +偏移量))
  • The first mapping, as expected, maps address range 0x00000000(again, this address depends on hardware config) through (0x00000000 + offset) to 0xCxxxxxxx through (0xCxxxxxxx + offset)
  • The second mapping, interestingly, maps address range 0x00000000 through (0x00000000 + offset) to itself(i.e.: 0x00000000 --> (0x00000000 + offset))

为什么要这么做? 请记住,在打开MMU之前,CPU发出的每个地址都是物理地址(从0x00000000开始),在打开MMU之后,CPU发出的每个地址都是虚拟地址(从0xC0000000开始).
因为ARM是管道结构,所以在打开MMU的那一刻,ARM的流水线中仍然有一些指令在使用MMU开启之前由CPU生成的(物理)地址!为了避免弄乱这些说明,必须设置相同的映射来满足他们的要求.

Why doing that? Remember that before MMU is turned on, every address issued by CPU is physical address(starting at 0x00000000) and after MMU is turned on, every address issued by CPU is virtual address(starting at 0xC0000000).
Because ARM is a pipeline structure, at the moment MMU is turned on, there are still instructions in ARM's pipeine that are using (physical) addresses that are generated by CPU before MMU is turned on! To avoid these instructions to get blown up, an identical mapping has to be set up to cater them.

现在回到您的问题:

  1. 这时,打开页表后,内核空间仍然是1GB(从0xC0000000-0xFFFFFFFF)?

A:我想你的意思是打开MMU.答案是肯定的,内核空间为1GB(实际上它还占用0xC0000000以下的几兆字节,但这不是我们感兴趣的内容)

A: I guess you mean turning on MMU. The answer is yes, kernel space is 1GB(actually it also occupies several mega bytes below 0xC0000000, but that is not of our interest)

  1. 并且在内核进程的页表中,仅映射范围为0xC0000000-0xFFFFFFFF的页表项(PTE). PTE出局了 此范围的值不会被映射,因为内核代码永远不会跳到那里 ?
  1. And in the page tables of kernel process, only page table entries (PTE) in range from 0xC0000000 - 0xFFFFFFFF are mapped ?. PTEs are out of this range will be not mapped because kernel code never jump there ?

A:尽管该问题的答案非常复杂,因为它涉及许多有关特定内核配置的细节.
要完全回答这个问题,您需要阅读内核源代码中用于设置初始页表的部分(汇编函数__create_page_tables)和用于设置最终页表的函数(C函数pageing_init).
简单来说,ARM中有两个级别的页表,第一个页表是PGD,占用16KB.内核首先在初始化过程中将该PGD清零,然后在汇编函数__create_page_tables中进行初始映射.在功能__create_page_tables中,仅映射了很小一部分的地址空间.
之后,在函数paging_init中建立最终页表,并且在此函数中,映射了很大一部分地址空间.假设如果只有512M RAM,则对于大多数常见配置,将按内核代码逐节映射此512M-RAM(1节为1MB).如果您的RAM很大(例如2GB),则仅一部分RAM将被直接映射. (我将在这里停止,因为有关问题2的详细信息太多了)

A: While the answer to this question is quite complicated because it involves lot of details regarding specific kernel configurations.
To fully answer this question, you need to read the part of kernel source code that set up the initial page table(assembly function __create_page_tables) and the function which sets up the final page table(C function paging_init).
To put it simple, there are two levels of page table in ARM, the first page table is PGD, which occupies 16KB. Kernel first zeros out this PGD during initialization process and does the initial mapping in assembly function __create_page_tables. In function __create_page_tables, only a very small portion of address space is mapped.
After that, the final page table is set up in function paging_init, and in this function, a quite large portion of address space is mapped. Say if you only have 512M RAM, for most common configurations, this 512M-RAM would be mapping by kernel code section by section(1 section is 1MB). If your RAM is quite large(say 2GB), only a portion of your RAM will be directly mapped. (I will stop here because there are too many details regarding Question 2)

  1. 打开页表前后的映射地址是否相同?

A:我认为我已经在第二招:打开MMU之前设置初始页表时使用相同的映射"的解释中回答了这个问题.

A: I think I've already answered this question in my explanation of "Second trick: identical mapping when setting up initial page table before turning on MMU."

4.内核空间中的页表是全局的,将在 系统中的所有流程,包括用户流程?

4 . The page table in kernel space is global and will be shared across all process in the system including user process ?

A:是和否.是的,因为所有进程共享内核页表的相同副本(内容)(较高的1GB部分).否,是因为每个进程都使用自己的16KB内存来存储内核页表(尽管每个进程的1GB以上部分的页表内容都是相同的).

A: Yes and no. Yes because all processes share the same copy(content) of kernel page table(higher 1GB part). No because each process uses its own 16KB memory to store the kernel page table(although the content of page table for higher 1GB part is identical for every process).

5.这种机制在x86 32位和ARM中是相同的吗?

5 . This mechanism is same in x86 32bit and ARM ?

不同的架构使用不同的机制

Different Architectures use different mechanism

这篇关于引导期间Linux内核空间中的页表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆