内存对齐检查机制检查的地址是有效地址、线性地址还是物理地址? [英] Is the address checked by the memory alignment check mechanism a effective address, a linear address or a physical address?

查看:29
本文介绍了内存对齐检查机制检查的地址是有效地址、线性地址还是物理地址?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究对齐检查的问题.但我不知道处理器是在检查有效地址、线性地址还是物理地址,还是全部检查.

比如一个数据的有效地址已经对齐,但是加上段描述符的基地址形成的线性地址不再对齐,此时处理器抛出#AC异常.

解决方案

TL;DR

我认为是线性地址.

继续阅读测试方法和测试代码.


不是有效地址(也就是偏移量)

为了测试这一点,使用一个基数未对齐的段就足够了.
在我的测试中,我使用了基数为 1 的 32 位数据段.

测试是一个简单"的测试.传统(即非 UEFI)引导加载程序将创建所述描述符并测试访问具有 DWORD 宽度的偏移量 0x7000 和 0x7003.
前者会生成#AC,后者不会.

这表明检查的不仅仅是偏移量,因为 0x7000 是对齐的偏移量,但仍然以 1 为基数出错.

这是预期的.

我有在测试中使用最少输出的传统,因此必须进行解释.

首先在VGA缓冲区中连续六行写入六个蓝色As.
然后在执行加载之前,为每个 As 设置一个指针.
#AC 处理程序将增加指向的字节.
因此,如果一行包含 B,则访问会生成一个 #AC.

前四行用于:

  1. 使用基数为 0 且偏移量为 0x7000h 的段进行访问.正如预期的那样,没有#AC
  2. 使用基数为 0 且偏移量为 0x7003h 的段进行访问.正如预期的那样,#AC
  3. 使用基数为 1 且偏移量为 0x7000h 的段进行访问.这确实会生成一个 #AC,从而证明它是所检查的物理地址的线性.
  4. 使用基数为 1 且偏移量为 0x7003h 的段进行访问.这不会生成 #AC,确认第 3 点.

接下来的两行用于检查线性地址与物理地址.

这不是物理地址:#AC 而不是 #PF

#AC 测试最多只能对齐 16 个字节,但线性地址和物理地址共享相同的对齐方式,最多可达 4KiB.
我们需要一个内存访问,它需要一个至少 8KiB 对齐的数据结构来测试它是用于检查的物理地址还是线性地址.

不幸的是,目前还没有这样的访问权限.

我想我仍然可以通过检查未对齐的加载目标未映射的页面时生成的异常来收集一些见解.
如果生成#PF,CPU 将首先转换线性地址,然后进行检查.反过来说,如果生成了#AC,CPU 会在翻译前检查(记住页面没有被映射).

我修改了测试以启用页面、映射最小数量的页面并通过将指针下的字节增加 2 来处理 #PF.

执行加载时,如果生成#AC,则相应的A 将变为B,如果生成#PF,则相应的A 将变为C.
请注意,两者都是错误(堆栈上的 eip 指向违规指令),但两个处理程序都从 next 指令恢复(因此每次加载仅执行一次).

这些是最后两行的含义:

  1. 使用基数为 1 且偏移量为 0x7003h 的段访问未映射的页面.这会按预期生成 #PF(访问已对齐,因此此处唯一可能的异常是 #PF).
  2. 使用基数为 1 且偏移量为 0x7000h 的段访问未映射的页面.这会生成一个 #AC,因此 CPU 在尝试转换地址之前会检查对齐.

第 6 点似乎表明 CPU 将对 线性地址 执行检查,因为没有完成对页表的访问.
在第 6 点中,可能会生成两个异常,未生成 #PF 的事实意味着 CPU 在执行对齐检查时尚未尝试转换地址.(或者,#AC 逻辑上优先.但硬件可能不会在处理 #AC 异常之前执行页面遍历,即使它在执行基址 + 偏移量计算后确实探测了 TLB.)

测试代码

代码凌乱,比想象中更繁琐.
主要障碍是#AC 仅在 CPL=3 下工作.
所以我们需要创建CPL=3的描述符,加上一个TSS段和一个TSS描述符.
为了处理异常,我们需要一个 IDT,我们还需要分页.

BITS 16组织 7c00h;跳过 BPB(我的 BIOS 主动覆盖它)jmp 简短内容 __SKIP_BPB__;我仔细观察了 BPB 大小(至少是可能被覆盖的部分)时间 40h db 0__SKIP_BPB__:;设置段(包括CS)异或斧头,斧头mov ds, axmov ss, 斧头异或 sp, spjmp 0:__开始____开始__:;清除并设置视频模式(在我们切换到PM之前)mov ax, 03h10 小时;禁用中断并加载GDT和IDT命令行lgdt [GDT]利特 [IDT];启用PM移动 eax, cr0或 al, 1mov cr0, eax;写一个 TSS 段,我们将 104h DWORD 置零,只设置 SS0:ESP0 字段mov di, 7000hmov cx, 104h异或斧头,斧头代表停止mov DWORD [7004h], 7c00h ;ESP0mov WORD [7008h], 10h ;SS0;在EFLAGS中设置AC推送或 DWORD [esp], 1 <<18弹出窗口;在 CR0 中设置 AM移动 eax, cr0或 eax, 1<<18mov cr0, eax;好的,让我们真正进入PMjmp 08h:__32____32__:位 32;设置堆栈和DSmov ax, 10hmov ss, 斧头mov esp, 7c00hmov ds, ax;设置#AC处理程序mov DWORD [IDT+8+17*8], ((AC_handler-$$+7c00h) & 0ffffh) |00080000hmov DWORD [IDT+8+17*8+4], 8e00h |(((AC_handler-$$+7c00h)>>16)<<16);设置#PF处理程序mov DWORD [IDT+8+14*8], ((PF_handler-$$+7c00h) & 0ffffh) |00080000hmov DWORD [IDT+8+14*8+4], 8e00h |(((PF_handler-$$+7c00h)>>16)<<16);设置TSSmov ax, 30h斧头;分页是:;7xxx ->身份映射(包含代码和所有堆栈和系统结构);8xxx ->不存在;9xxx ->映射到 VGA 文本缓冲区 (0b8xxxh);注意分页结构在 6000h 和 5000h,这是可以的,因为它们是物理地址;设置页面目录为6000hmov eax, 6000hmov cr3, eax;设置Page Directory Entry 0 (for 00000000h-00300000h)指向5000h处的页表mov DWORD [eax], 5007h;将页表条目 7(对于 00007xxxh)设置为标识映射,将页表条目 8(对于 000008xxxh)设置为不存在mov eax, 5000h + 7*4mov DWORD [eax], 7007hmov DWORD [eax+4], 8006h;将页面 9000h 映射到 0b8000hmov DWORD [eax+8], 0b801fh;启用分页移动 eax, cr0或 eax, 80000000hmov cr0, eax;更改权限(转到 CPL=3)推送 DWORD 23h ;SS3推送 DWORD 07a00h ;ESP3推送 DWORD 1bh ;CS3推送 DWORD __32user__ ;EIP3回复__32用户__:;;这里我们处于 CPL=3;;设置 DS 为基数为 0 的段,ES 为基数为 1 的段mov ax, 23hmov ds, axmov ax, 2bh移动,斧头;连续六行写六个As(从第4行开始)异或 ecx, ecxmov ecx, 6mov ebx, 9000h + 80*2*3 ;指向VGA文本帧缓冲区的第4行.init_markers:mov WORD [ebx], 0941h添加 bx, 80*2十二月jnz .init_markers;ebx 指向第一个 A子 ebx, 80*2 * 6;Base 0 + Offset 0 = 0, 不应出错(标记保持 A)mov eax, DWORD [ds:7000h];Base 0 + Offset 1 = 1, 应该出错(marker变成B)添加 bx, 80*2mov eax, DWORD [ds:7001h];Base 1 + Offset 0 = 1, 应该出错(marker变成B)添加 bx, 80*2mov eax, DWORD [es:7000h];Base 1 + Offset 3 = 4, 不应该出错(标记保持 A)添加 bx, 80*2mov eax, DWORD [es:7003h];Base 1 + Offset 3 = 4 但页面未映射,应该#PF(标记变为 C)添加 bx, 80*2mov eax, DWORD [es:8003h];Base 1 + Offset 0 = 1 但页面未映射,如果#PF 标记变为 C,如果 #AC 标记变为 B添加 bx, 80*2mov eax, DWORD [es:8000h];Loop foever (不能在 CPL=3 时使用 HLT)jmp $;#PF 处理程序;将ebx指向的字节增加2PF_处理程序:add esp, 04h ;去掉错误码add DWORD [esp], 6 ;跳过当前指令add BYTE [ebx], 2 ;增量愤怒;#AC 处理程序;与#PF 处理程序相同,但加一AC_handler:添加 esp, 04h添加双字 [esp], 6inc BYTE [ebx]愤怒;GDT(条目0用作GDTR的内容)GDT dw GDT.end-GDT - 1GDT体重 0dd 0000ffffh, 00cf9a00h ;08 代码, 32, DPL 0dd 0000ffffh, 00cf9200h ;10 数据, 32, DPL 0dd 0000ffffh, 00cffa00h ;18 代码, 32, DPL 3dd 0000ffffh, 00cff200h ;20 数据, 32, DPL 3dd 0001ffffh, 00cff200h ;28 数据, 32, DPL 3, Base = 1dd 7000ffffh, 00cf8900h ;30 数据, 32, 0 (TSS).结尾:;IDT,为了节省空间,条目是动态设置的IDT dw 18*8-1dd IDT+8体重 0;签名时间 510-($-$$) db 0dw 0aa55h

检查线性地址有意义吗?

我认为这不是特别重要.如上所述,线性地址和物理地址共享相同的对齐方式,最高可达 4KiB.
所以,现在,这根本不重要.
目前,超过 64 字节的访问仍然需要分块执行,而这个限制在 x86 CPU 的微架构中设置得很深.

I am studying the issue of alignment check. But I don't know whether the processor is checking on effective addresses, linear addresses or physical addresses, or all checks.

For example, the effective address of a data has been aligned, but the linear address formed by adding the base address of the segment descriptor is no longer aligned, and the processor throws an #AC exception at this time.

解决方案

TL;DR

I think it's the linear address.

Keep reading for the test methodology and the test code.


It's not the effective address (aka the offset)

To test this it suffices to use a segment with a base that is not aligned.
In my test, I've used a 32-bit data segment with a base of 1.

The test is a "simple" legacy (i.e. non-UEFI) bootloader that will create said descriptor and test accessing the offsets 0x7000 and 0x7003 with DWORD width.
The former will generate an #AC, the latter won't.

This demonstrates that it's not the offset alone that is checked, because 0x7000 is an aligned offset that still faults with a base of 1.

This is expected.

I have a tradition of using a minimal output for the tests, so an explanation is mandatory.

First, six blue As are written in six consecutive rows in the VGA buffer.
Then before executing a load, a pointer is set to each of these As.
The #AC handler will increment the pointed-to byte.
So, if a row contains a B, the access generated an #AC.

The first four rows are used for:

  1. Access using a segment with base 0 and offset 0x7000h. As expected, no #AC
  2. Access using a segment with base 0 and offset 0x7003h. As expected, #AC
  3. Access using a segment with base 1 and offset 0x7000h. This does generate an #AC thereby demonstrating that it's either the linear of the physical address that's checked.
  4. Access using a segment with base 1 and offset 0x7003h. This doesn't generate an #AC, confirming point 3.

The next two rows are used to check the linear address vs the physical address.

It's not the physical address: #AC instead of #PF

The #AC test only alignments up to 16 bytes but a linear and a physical address share the same alignment up to 4KiB at least.
We would need a memory access that requires a data structure aligned on, at least, 8KiB to test if it's the physical or the linear address that's used for the check.

Unfortunately, there is no such access (yet).

I thought I could still gather some insight by checking what exception is generated when a misaligned load target an unmapped page.
If a #PF is generated, the CPU will first translate the linear address and will then check. On the other way around, if an #AC is generated, the CPU will check before translating (remember that the page is not mapped).

I modified the test to enable page, map the minimum amount of pages and handle a #PF by incrementing the byte under the pointer by two.

When a load is executed, the corresponding A will either become a B if an #AC is generated or a C if a #PF is generated.
Note that both are faults (eip on the stack points to the offending instruction) but both handlers resume from the next instruction (so each load is executed only once).

These are the meaning of the last two rows:

  1. Access to an unmapped page using a segment with base 1 and offset 0x7003h. This generates a #PF as expected (the access is aligned so the only exception possible here is a #PF).
  2. Access to an unmapped page using a segment with base 1 and offset 0x7000h. This generates an #AC, therefore the CPU checks the alignment before attempting to translate the address.

Point 6 seems to suggest that the CPU will perform the check on the linear address since no access to the page table is done.
In point 6 both exceptions could be generated, the fact that #PF is not generated means that the CPU hasn't attempted translating the address when the alignment check is performed. (Or that #AC logically takes precedence. But likely the hardware wouldn't do a page walk before taking the #AC exception, even if it did probe the TLB after doing the base+offset calculation.)

Test code

The code is messy and more cumbersome than one may expect.
The main hindrance is #AC only working at CPL=3.
So we need to create the CPL=3 descriptor, plus a TSS segment and a TSS descriptor.
To handle the exception we need an IDT and we also need paging.

BITS 16
ORG 7c00h

  ;Skip the BPB (My BIOS actively overwrite it)
  jmp SHORT __SKIP_BPB__

  ;I eyeballed the BPB size (at least the part that may be overwritten)
  TIMES 40h db 0

__SKIP_BPB__:
  ;Set up the segments (including CS)
  xor ax, ax
  mov ds, ax
  mov ss, ax
  xor sp, sp
  jmp 0:__START__

__START__:
  ;Clear and set the video mode (before we switch to PM)
  mov ax, 03h
  int 10h
  
  ;Disable the interrupts and load the GDT and IDT
  cli
  lgdt [GDT]
  lidt [IDT]
  
  ;Enable PM
  mov eax, cr0
  or al, 1
  mov cr0, eax
  

  ;Write a TSS segment, we zeros 104h DWORDs and only set the SS0:ESP0 fields
  mov di, 7000h
  mov cx, 104h
  xor ax, ax
  rep stosd
  
  mov DWORD [7004h], 7c00h    ;ESP0
  mov WORD [7008h], 10h       ;SS0
  
  
  ;Set AC in EFLAGS
  pushfd
  or DWORD [esp], 1 << 18 
  popfd
  
  ;Set AM in CR0
  mov eax, cr0
  or eax, 1<<18
  mov cr0, eax

  ;OK, let's go in PM for real
  jmp 08h:__32__
  
__32__:
  BITS 32

  ;Set the stack and DS
  mov ax, 10h 
  mov ss, ax 
  mov esp, 7c00h
  mov ds, ax
  
  ;Set the #AC handler
  mov DWORD [IDT+8+17*8], ((AC_handler-$$+7c00h) & 0ffffh) | 00080000h
  mov DWORD [IDT+8+17*8+4], 8e00h | (((AC_handler-$$+7c00h) >> 16) << 16)
  ;Set the #PF handler
  mov DWORD [IDT+8+14*8], ((PF_handler-$$+7c00h) & 0ffffh) | 00080000h
  mov DWORD [IDT+8+14*8+4], 8e00h | (((PF_handler-$$+7c00h) >> 16) << 16)

  ;Set the TSS
  mov ax, 30h
  ltr ax

  ;Paging is:
  ;7xxx -> Identity mapped (contains code and all the stacks and system structures)
  ;8xxx -> Not present
  ;9xxx -> Mapped to the VGA text buffer (0b8xxxh)
  ;Note that the paging structures are at 6000h and 5000h, this is OK as these are physical addresses

  ;Set the Page Directory at 6000h
  mov eax, 6000h
  mov cr3, eax
  ;Set the Page Directory Entry 0 (for 00000000h-00300000h) to point to a Page Table at 5000h 
  mov DWORD [eax], 5007h
  ;Set the Page Table Entry 7 (for 00007xxxh) to identity map and Page Table Entry 8 (for 000008xxxh) to be not present
  mov eax, 5000h + 7*4
  mov DWORD [eax], 7007h
  mov DWORD [eax+4], 8006h
  ;Map page 9000h to 0b8000h
  mov DWORD [eax+8],  0b801fh

  ;Enable paging
  mov eax, cr0 
  or eax, 80000000h
  mov cr0, eax

  ;Change privilege (goto CPL=3)
  push DWORD 23h            ;SS3
  push DWORD 07a00h         ;ESP3
  push DWORD 1bh            ;CS3
  push DWORD __32user__     ;EIP3
  retf 

__32user__:

  ; 
  ;Here we are at CPL=3
  ;

  ;Set DS to segment with base 0 and ES to one with base 1
  mov ax, 23h
  mov ds, ax
  mov ax, 2bh
  mov es, ax

  ;Write six As in six consecutive row (starting from the 4th)
  xor ecx, ecx 
  mov ecx, 6
  mov ebx, 9000h + 80*2*3   ;Points to 4th row in the VGA text framebuffer
.init_markers:
  mov WORD [ebx], 0941h
  add bx, 80*2
  dec ecx 
  jnz .init_markers

  ;ebx points to the first A
  sub ebx, 80*2 * 6

  ;Base 0 + Offset 0 = 0, Should not fault (marker stays A)
  mov eax, DWORD [ds:7000h]

  ;Base 0 + Offset 1 = 1, Should fault (marker becomes B)
  add bx, 80*2
  mov eax, DWORD [ds:7001h]

  ;Base 1 + Offset 0 = 1, Should fault (marker becomes B)
  add bx, 80*2
  mov eax, DWORD [es:7000h]

  ;Base 1 + Offset 3 = 4, Should not fault (marker stays A)
  add bx, 80*2
  mov eax, DWORD [es:7003h]

  ;Base 1 + Offset 3 = 4 but page not mapped, Should #PF (markers becomes C)
  add bx, 80*2
  mov eax, DWORD [es:8003h]

  ;Base 1 + Offset 0 = 1 but page not mapped, if #PF the markers becomes C, if #AC the markers becomes B
  add bx, 80*2
  mov eax, DWORD [es:8000h]

  ;Loop foever (cannot use HLT at CPL=3)
  jmp $
  

;#PF handler
;Increment the byte pointed by ebx by two
PF_handler:
  add esp, 04h        ;Remove the error code
  add DWORD [esp], 6  ;Skip the current instruction
  add BYTE [ebx], 2   ;Increment

  iret 

;#AC handler
;Same as the #PF handler but increment by one
AC_handler:
  add esp, 04h
  add DWORD [esp], 6
  inc BYTE [ebx]

  iret
  

  ;The GDT (entry 0 is used as the content for GDTR)
  GDT dw GDT.end-GDT - 1
      dd GDT
      dw 0
      
      dd 0000ffffh, 00cf9a00h   ;08 Code, 32, DPL 0
      dd 0000ffffh, 00cf9200h       ;10 Data, 32, DPL 0
      
      dd 0000ffffh, 00cffa00h       ;18 Code, 32, DPL 3
      dd 0000ffffh, 00cff200h       ;20 Data, 32, DPL 3
      dd 0001ffffh, 00cff200h       ;28 Data, 32, DPL 3, Base = 1

      dd 7000ffffh, 00cf8900h       ;30 Data, 32, 0 (TSS)

      .end: 

  ;The IDT, to save space the entries are set dynamically      
  IDT dw 18*8-1
      dd IDT+8
      dw 0
      

  ;Signature
  TIMES 510-($-$$) db 0
  dw 0aa55h

Does it make sense to check the linear address?

I don't think it's particularly relevant. As noted above, a linear and a physical address share the same alignment up to 4KiB.
So, for now, it doesn't matter at all.
Right now, accesses wider than 64 bytes would still need to be performed in chunks and this limit is set deep in the microarchitectures of the x86 CPUs.

这篇关于内存对齐检查机制检查的地址是有效地址、线性地址还是物理地址?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆