如何在编译/链接时使用地址进行计算? [英] How to do computations with addresses at compile/linking time?

查看:21
本文介绍了如何在编译/链接时使用地址进行计算?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我编写了一些代码来初始化 IDT,它将 32 位地址存储在两个不相邻的16 位半.IDT 可以存储在任何地方,您可以通过运行 LIDT 指令告诉 CPU 在哪里.

I wrote some code for initializing the IDT, which stores 32-bit addresses in two non-adjacent 16-bit halves. The IDT can be stored anywhere, and you tell the CPU where by running the LIDT instruction.

这是初始化表格的代码:

This is the code for initializing the table:

void idt_init(void) {
    /* Unfortunately, we can't write this as loops. The first option,
     * initializing the IDT with the addresses, here looping over it, and
     * reinitializing the descriptors didn't work because assigning a
     * a uintptr_t (from (uintptr_t) handler_func) to a descr (a.k.a.
     * uint64_t), according to the compiler, "isn't computable at load
     * time."
     * The second option, storing the addresses as a local array, simply is
     * inefficient (took 0.020ms more when profiling with the "time" command
     * line program!).
     * The third option, storing the addresses as a static local array,
     * consumes too much space (the array will probably never be used again
     * during the whole kernel runtime).
     * But IF my argument against the third option will be invalidated in
     * the future, THEN it's the best option I think. */

    /* Initialize descriptors of exception handlers. */
    idt[EX_DE_VEC] = idt_trap(ex_de);
    idt[EX_DB_VEC] = idt_trap(ex_db);
    idt[EX_NMI_VEC] = idt_trap(ex_nmi);
    idt[EX_BP_VEC] = idt_trap(ex_bp);
    idt[EX_OF_VEC] = idt_trap(ex_of);
    idt[EX_BR_VEC] = idt_trap(ex_br);
    idt[EX_UD_VEC] = idt_trap(ex_ud);
    idt[EX_NM_VEC] = idt_trap(ex_nm);
    idt[EX_DF_VEC] = idt_trap(ex_df);
    idt[9] = idt_trap(ex_res);  /* unused Coprocessor Segment Overrun */
    idt[EX_TS_VEC] = idt_trap(ex_ts);
    idt[EX_NP_VEC] = idt_trap(ex_np);
    idt[EX_SS_VEC] = idt_trap(ex_ss);
    idt[EX_GP_VEC] = idt_trap(ex_gp);
    idt[EX_PF_VEC] = idt_trap(ex_pf);
    idt[15] = idt_trap(ex_res);
    idt[EX_MF_VEC] = idt_trap(ex_mf);
    idt[EX_AC_VEC] = idt_trap(ex_ac);
    idt[EX_MC_VEC] = idt_trap(ex_mc);
    idt[EX_XM_VEC] = idt_trap(ex_xm);
    idt[EX_VE_VEC] = idt_trap(ex_ve);

    /* Initialize descriptors of reserved exceptions.
     * Thankfully we compile with -std=c11, so declarations within
     * for-loops are possible! */
    for (size_t i = 21; i < 32; ++i)
        idt[i] = idt_trap(ex_res);

    /* Initialize descriptors of hardware interrupt handlers (ISRs). */
    idt[INT_8253_VEC] = idt_int(int_8253);
    idt[INT_8042_VEC] = idt_int(int_8042);
    idt[INT_CASC_VEC] = idt_int(int_casc);
    idt[INT_SERIAL2_VEC] = idt_int(int_serial2);
    idt[INT_SERIAL1_VEC] = idt_int(int_serial1);
    idt[INT_PARALL2_VEC] = idt_int(int_parall2);
    idt[INT_FLOPPY_VEC] = idt_int(int_floppy);
    idt[INT_PARALL1_VEC] = idt_int(int_parall1);
    idt[INT_RTC_VEC] = idt_int(int_rtc);
    idt[INT_ACPI_VEC] = idt_int(int_acpi);
    idt[INT_OPEN2_VEC] = idt_int(int_open2);
    idt[INT_OPEN1_VEC] = idt_int(int_open1);
    idt[INT_MOUSE_VEC] = idt_int(int_mouse);
    idt[INT_FPU_VEC] = idt_int(int_fpu);
    idt[INT_PRIM_ATA_VEC] = idt_int(int_prim_ata);
    idt[INT_SEC_ATA_VEC] = idt_int(int_sec_ata);

    for (size_t i = 0x30; i < IDT_SIZE; ++i)
        idt[i] = idt_trap(ex_res);
}

idt_trapidt_int,定义如下:

#define idt_entry(off, type, priv) 
    ((descr) (uintptr_t) (off) & 0xffff) | ((descr) (KERN_CODE & 0xff) << 
    0x10) | ((descr) ((type) & 0x0f) << 0x28) | ((descr) ((priv) & 
    0x03) << 0x2d) | (descr) 0x800000000000 | 
    ((descr) ((uintptr_t) (off) & 0xffff0000) << 0x30)

#define idt_int(off) idt_entry(off, 0x0e, 0x00)
#define idt_trap(off) idt_entry(off, 0x0f, 0x00)

idtuint64_t 的数组,因此这些宏被隐式转换为该类型.uintptr_t 是保证能够将指针值保存为整数的类型,并且在 32 位系统上通常为 32 位宽.(64 位 IDT 有 16 字节条目;此代码用于 32 位).

idt is an array of uint64_t, so these macros are implicitly cast to that type. uintptr_t is the type guaranteed to be capable of holding pointer values as integers and on 32-bit systems usually 32 bits wide. (A 64-bit IDT has 16-byte entries; this code is for 32-bit).

我收到警告说 initializer 元素不是常量 由于地址修改正在播放中.
绝对确定在链接时地址是已知的.
我可以做些什么来完成这项工作吗? 使 idt 数组自动运行会起作用,但这需要整个内核在一个函数的上下文中运行,这将我想,麻烦一些.

I get the warning that the initializer element is not constant due to the address modification in play.
It is absolutely sure that the address is known at linking time.
Is there anything I can do to make this work? Making the idt array automatic would work but this would require the whole kernel to run in the context of one function and this would be some bad hassle, I think.

我可以在运行时通过一些额外的工作来完成这项工作(就像 Linux 0.01 一样),但让我烦恼的是,在链接时技术上可行的东西实际上是in可行的.

I could make this work by some additional work at runtime (as Linux 0.01 also does) but it just annoys me that something technically feasible at linking time is actually infeasible.

推荐答案

主要问题是函数地址是链接时常量,不是严格编译时常量.编译器不能只获取 32b 二进制整数并将其分成两个单独的部分粘贴到数据段中.相反,它必须使用目标文件格式向链接器指示它应该在链接完成时填写哪个符号的最终值(+偏移量).常见的情况是作为指令的立即操作数、有效地址中的位移或数据段中的值.(但在所有这些情况下,它仍然只是填充 32 位绝对地址,因此所有 3 个都使用相同的 ELF 重定位类型.对于跳转/调用偏移的 relative 位移有不同的重定位.)

The main problem is that function addresses are link-time constants, not strictly compile time constants. The compiler can't just get 32b binary integers and stick that into the data segment in two separate pieces. Instead, it has to use the object file format to indicate to the linker where it should fill in the final value (+ offset) of which symbol when linking is done. The common cases are as an immediate operand to an instruction, a displacement in an effective address, or a value in the data section. (But in all those cases it's still just filling in 32-bit absolute address so all 3 use the same ELF relocation type. There's a different relocation for relative displacements for jump / call offsets.)

ELF 有可能被设计为存储符号引用,以便在链接时用地址的复杂函数(或至少像 MIPS 上的 lui $t0, %hi(symbol)/ori $t0, $t0, %lo(symbol) 从两个 16 位立即数构建地址常量).但实际上唯一允许的功能是 addition/减法,用于 mov eax, [ext_symbol + 16] 之类的东西.

It would have been possible for ELF to have been designed to store a symbol reference to be substituted at link time with a complex function of an address (or at least high / low halves like on MIPS for lui $t0, %hi(symbol) / ori $t0, $t0, %lo(symbol) to build address constants from two 16-bit immediates). But in fact the only function allowed is addition/subtraction, for use in things like mov eax, [ext_symbol + 16].

当然,您的操作系统内核二进制文件有可能在构建时拥有一个具有完全解析地址的静态 IDT,因此您在运行时所需要做的就是执行单个 lidt 指令.然而,标准构建工具链是一个障碍.如果不对可执行文件进行后处理,您可能无法实现这一目标.

It is of course possible for your OS kernel binary to have a static IDT with fully resolved addresses at build time, so all you need to do at runtime is execute a single lidt instruction. However, the standard build toolchain is an obstacle. You probably can't achieve this without post-processing your executable.

例如你可以这样写,在最终的二进制文件中生成一个带有完整填充的表,这样数据就可以就地打乱:

e.g. you could write it this way, to produce a table with the full padding in the final binary, so the data can be shuffled in-place:

#include <stdint.h>

#define PACKED __attribute__((packed))

// Note, this is the 32-bit format.  64-bit is larger    
typedef union idt_entry {

    // we will postprocess the linker output to have this format
    // (or convert at runtime)
    struct PACKED runtime {   // from OSdev wiki
       uint16_t offset_1; // offset bits 0..15
       uint16_t selector; // a code segment selector in GDT or LDT
       uint8_t zero;      // unused, set to 0
       uint8_t type_attr; // type and attributes, see below
       uint16_t offset_2; // offset bits 16..31
    } rt;

    // linker output will be in this format
    struct PACKED compiletime {
       void *ptr; // offset bits 0..31
       uint8_t zero;
       uint8_t type_attr;
       uint16_t selector; // to be swapped with the high16 of ptr
    } ct;
} idt_entry;

// #define idt_ct_entry(off, type, priv) { .ptr = off, .type_attr = type, .selector = priv }
#define idt_ct_trap(off) { .ct = { .ptr = off, .type_attr = 0x0f, .selector = 0x00 } }
// generate an entry in compile-time format

extern void ex_de();  // these are the raw interrupt handlers, written in ASM
extern void ex_db();  // they have to save/restore *all* registers, and end with  iret, rather than the usual C ABI.

// it might be easier to use asm macros to create this static data, 
// just so it can be in the same file and you don't need cross-file prototypes / declarations
// (but all the same limitations about link-time constants apply)
static idt_entry idt[] = {
    idt_ct_trap(ex_de),
    idt_ct_trap(ex_db),
    // ...
};

// having this static probably takes less space than instructions to write it on the fly
// but not much more.  It would be easy to make a lidt function that took a struct pointer.
static const struct PACKED  idt_ptr {
  uint16_t len;  // encoded as bytes - 1, so 0xffff means 65536
  void *ptr;
} idt_ptr = { sizeof(idt) - 1, idt };


/****** functions *********/

// inline
void load_static_idt(void) {
  asm volatile ("lidt  %0"
               : // no outputs
               : "m" (idt_ptr));
  // memory operand, instead of writing the addressing mode ourself, allows a RIP-relative addressing mode in 64bit mode
  // also allows it to work with -masm=intel or not.
}

// Do this once at at run-time
// **OR** run this to pre-process the binary, after link time, as part of your build
void idt_convert_to_runtime(void) {
#ifdef DEBUG
  static char already_done = 0;  // make sure this only runs once
  if (already_done)
    error;
  already_done = 1;
#endif
  const int count = sizeof idt / sizeof idt[0];
  for (int i=0 ; i<count ; i++) {
    uint16_t tmp1 = idt[i].rt.selector;
    uint16_t tmp2 = idt[i].rt.offset_2;
    idt[i].rt.offset_2 = tmp1;
    idt[i].rt.selector = tmp2;
    // or do this swap in fewer insns with SSE or MMX pshufw, but using vector instructions before setting up the IDT may be insane.
  }
}

这确实编译.参见所述<代码的一个diff> -m32 -m64 有关Godbolt编译探险ASM输出.看数据部分的布局(注意.value.short的同义词,是16位.)(但注意IDT表格式不同对于 64 位模式.)

This does compile. See a diff of the -m32 and -m64 asm output on the Godbolt compiler explorer. Look at the layout in the data section (note that .value is a synonym for .short, and is 16 bits.) (But note that the IDT table format is different for 64-bit mode.)

我认为我的大小计算正确(bytes - 1),如 http://wiki.osdev.org/Interrupt_Descriptor_Table.最小值 100h 字节长(编码为 0x99).另请参阅 https://en.wikibooks.org/wiki/X86_Assembly/Global_Descriptor_Table.(lgdt 大小/指针的工作方式相同,尽管表格本身具有不同的格式.)

I think I have the size calculation correct (bytes - 1), as documented in http://wiki.osdev.org/Interrupt_Descriptor_Table. Minimum value 100h bytes long (encoded as 0x99). See also https://en.wikibooks.org/wiki/X86_Assembly/Global_Descriptor_Table. (lgdt size/pointer works the same way, although the table itself has a different format.)

另一个选项,而不是将 IDT 静态放在数据部分中,而是将其放在 bss 部分中,并将数据作为立即常量存储在将其初始化的函数中(或在该函数读取的数组中).

The other option, instead of having the IDT static in the data section, is to have it in the bss section, with the data stored as immediate constants in a function that will initialize it (or in an array read by that function).

无论哪种方式,该函数(及其数据)都可以在 .init 部分中,您可以在完成后重新使用其内存.(Linux 这样做是为了在启动时从只需要一次的代码和数据中回收内存.)这将为您提供小二进制大小的最佳折衷(因为 32b 地址小于 64b IDT 条目),并且不会在代码上浪费运行时内存设置 IDT.在启动时运行一次的小循环的 CPU 时间可以忽略不计.(Godbolt 上的版本完全展开,因为我只有 2 个条目,并且它将地址作为 32 位立即数嵌入到每条指令中,即使使用 -Os 也是如此.使用足够大的表(只需复制/粘贴以复制一行)即使在 -O3 处也会得到一个紧凑的循环.-Os 的阈值较低.)

Either way, that function (and its data) can be in a .init section whose memory you re-use after it's done. (Linux does this to reclaim memory from code and data that's only needed once, at startup.) This would give you the optimal tradeoff of small binary size (since 32b addresses are smaller than 64b IDT entries), and no runtime memory wasted on code to set up the IDT. A small loop that runs once at startup is negligible CPU time. (The version on Godbolt fully unrolls because I only have 2 entries, and it embeds the address into each instruction as a 32-bit immediate, even with -Os. With a large enough table (just copy/paste to duplicate a line) you get a compact loop even at -O3. The threshold is lower for -Os.)

如果没有内存重用 haxx,则可能是一个紧密循环来重写 64b 条目是要走的路.在构建时执行它会更好,但是您需要一个自定义工具来在内核二进制文件上运行转换.

Without memory-reuse haxx, probably a tight loop to rewrite 64b entries in place is the way to go. Doing it at build time would be even better, but then you'd need a custom tool to run the tranformation on the kernel binary.

将数据存储在立即数中理论上听起来不错,但每个条目的代码总计可能超过 64b,因为它无法循环.将地址一分为二的代码必须完全展开(或放置在函数中并调用).即使您有一个循环来存储所有相同的多条目内容,每个指针也需要一个 mov r32, imm32 来获取寄存器中的地址,然后是 mov word [idt+i + 0], ax/shr​​ eax, 16/mov word [idt+i + 6], ax.那是很多机器代码字节.

Having the data stored in immediates sounds good in theory, but the code for each entry would probably total more than 64b, because it couldn't loop. The code to split an address into two would have to be fully unrolled (or placed in a function and called). Even if you had a loop to store all the same-for-multiple-entries stuff, each pointer would need a mov r32, imm32 to get the address in a register, then mov word [idt+i + 0], ax / shr eax, 16 / mov word [idt+i + 6], ax. That's a lot of machine-code bytes.

这篇关于如何在编译/链接时使用地址进行计算?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆