x86平台中的KVM影子页面表处理 [英] KVM shadow page table handling in x86 platform

查看:89
本文介绍了x86平台中的KVM影子页面表处理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

据我了解,在没有硬件支持来宾虚拟机托管物理地址转换的处理器上,KVM使用影子页表.

From what I understand, on processors that doesn't have hardware support for guest virtual to host physical address translation KVM uses the shadow page table.

影子页表.硬件中是否有用于修改页表的特殊说明(以x86为例)?除非有特殊说明,否则不会存在VMM的陷阱. Linux内核不是通过软件维护的页表只是另一种数据结构吗?为什么它需要特殊的说明来进行更新?

Shadow page table is built and updated when the guest OS modifies its page tables. Are there special instructions in the hardware (let’s take x86 for reference) for modifying the page table? Unless there are special instructions there won't be a trap to the VMM. Isn't the page table maintained in software by the Linux kernel just another data structure? Why would it need special instructions to update it?

谢谢!

推荐答案

我使用的不是KVM,而是另一个VMM,所以我不知道KVM的详细信息,但是原理对于所有VMM都是相同的.它的工作方式是有两组页表.

I work with another VMM than KVM, so I don't know the details of KVM, but the principle is the same for all VMM's. The way it works is that there are two sets of page-tables.

除了用于页表基地址的特殊寄存器外,没有特殊的指令来管理页表[一般来说,其他寄存器中的一些随机位与配置处理器有关,但这通常是一次性"设置].页表只是用常规指令写入的内存的一部分-如果您确实想要[,除非您完全知道自己在做什么,否则它很可能会引起问题! ],但典型的操作是"mov"(存储)或"xchg"(交换)操作.

There are no special instructions to manage page-tables aside from the special register for the page-table base address [and some random bits in other registers to do with configuring the processor in general, but that's typically a "one off" setup]. Page tables are just bits of memory that are written to with regular instrucitons - you can do add, subtract, and, or, multiply etc, if you really want [it'll most likely cause problems unless you absolutely know what you are doing!], but the typical operato is a "mov" (store) or a "xchg" (exchange) operation.

第一个页表是操作系统实际编写的页表. VMM将其设置为只读存储器,因此,只要对其进行写操作,处理器就会发生页面故障.由于KVM使用处理器中的硬件虚拟化扩展(AMD处理器上的SVM或Intel处理器上的VMX),因此页面错误由VMM捕获(在本例中为KVM),在其中检查写操作以查看其是否为页面". -table write",如果可以的话,它会转换为第二个影子页面表-这就是VMM使VM相信内存从0开始到1GB的方式,但实际上,我们占用了一堆页面到处都是,并集合了1GB的内存,这些内存似乎是一组连续的平坦页面.当然,由于VMM是撒谎"到VM内的OS,所以我们不能让OS写入它的REAL页表,因为它不知道要在此处写入"TRUE"页表的值. [但是,我们确实需要让操作系统拥有自己的页面表,以防从页面表中读取页面,并且在操作系统实际不期望的情况下将其完全弄糊涂.]

The first the pagetable is the one actually written by the OS. The VMM sets this up as read-only memory, so whenever there is a write to it, the processor page-faults. Since KVM uses hardware virtualization extensions in the processor (SVM on AMD processors or VMX on Intel processors), the page-fault is captured by the VMM (KVM in this case), where the write operation is inspected to see if it's a "page-table write", if so, it is translated to the second, shadow page-table - this is how the VMM makes the VM believe that memory starts at 0 and goes to 1GB, but in reality we've taken a bunch of pages all over the place and put together a 1GB of memory that appear to be a flat, consecutive set of pages. Of course, since the VMM is "lying" to the OS inside the VM, we can't let the OS write it's REAL page-tables, since it wouldn't know the "true" page-table value to write there. [But we do need to also let the OS have its own page-tables, in case it were to read from the page-table and be utterly confused when it isn't what the OS actually expects].

处理器真实CR3"由VMM设置,并指向影子页表.

The processors "real CR3" is set by the VMM, and points at the shadow page-table.

VMM将捕获CR3(页表基地址)写操作,以便它可以跟踪页表所在的位置(并跟踪要使用的实际CR3").但是,VMM不需要了解CR3的读取,因此通常允许它们在VM中直接发生而不拦截它.

The VMM will trap on CR3 (page-table base-address) writes, so that it can track where page-tables live (and keep track of which "real CR3" to use). However, the VMM doesn't need to know about reads of CR3, so they are usually allowed to happen directly in the VM without intercepting it.

处理器中VMM扩展的全部要点是支持这种特殊指令的拦截,同时仍将VM中的大多数特权指令作为常规"指令运行-例如,您不需要每次对标志寄存器进行写操作以跳入VMM以启用/禁用中断等-使其在VM中发生,就好像它是真正的硬件一样.但是某些寄存器对于VMM可以控制至关重要.

The whole point of the VMM extensions in the processors is to support this sort of intercepting of special instructions, while still running most of the privileged instructions in the VM as "regular" instructions - you wouldn't, for example, want to jump into the VMM for every write to the flags register to enable/disable interrupts, etc - let that happen in the VM as if it was a real piece of hardware. But some registers are critical that the VMM can control.

很显然,当页面表有硬件支持时,则存在两层页面表.一种将"0-1GB"转换为分散在各处",另一种是操作系统维护的实际页表.在这种情况下,无需拦截任何页表写入,页错误或任何CR3更新-操作系统可以在其允许的由基础页表映射的内存部分中执行所需的操作,并且如果VM走出允许的部分,则VMM会将其捕获为"VMM页表错误".当然,这会使整个过程变得更有效率.

Obviously, when there is hardware support for the page-tables, then there is two layers of page-tables. One that translates the "0-1GB" into "scattered all over the place", and the other being the actual page-table that the OS maintains. In this case, there is no need to intercept any of the page-table writes, page-faults or any of the CR3 updates - the OS can do what it likes within it's allowed sections of memory that is mapped by the underlying page-tables, and if the VM walks outside the allowed section, the VMM will catch that as a "VMM page-table fault". Which of course makes the whole thing quite a bit more efficient.

我希望这是有道理的.

这篇关于x86平台中的KVM影子页面表处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆