对象在装配级的x86中如何工作? [英] How do objects work in x86 at the assembly level?

查看:77
本文介绍了对象在装配级的x86中如何工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试了解对象在装配级的工作方式.对象如何准确地存储在内存中,成员函数如何访问它们?

I'm trying to understand how objects work at the assembly level. How exactly are objects stored in memory, and how do member-functions access them?

(编辑者的注释:原始版本太宽泛了,并且首先对汇编和结构的工作方式有些困惑.)

(editor's note: the original version was way too broad, and had some confusion over how assembly and structs work in the first place.)

推荐答案

除具有虚拟成员的类外,类与结构的存储方式完全相同.在这种情况下,有一个隐式vtable指针作为第一个成员(见下文).

Classes are stored exactly the same way as structs, except when they have virtual members. In that case, there's an implicit vtable pointer as the first member (see below).

结构存储为连续的内存块( http://en.cppreference.com/w/c/language/struct ).我链接了C定义,因为在C ++中,struct表示class(默认为public:而不是private:).

A struct is stored as a contiguous block of memory (if the compiler doesn't optimize it away or keep the member values in registers). Within a struct object, addresses of its elements increase in order in which the members were defined. (source: http://en.cppreference.com/w/c/language/struct). I linked the C definition, because in C++ struct means class (with public: as the default instead of private:).

structclass视为一个字节块,可能太大而无法容纳在寄存器中,但会被复制为值". 汇编语言没有类型系统;内存中的字节只是字节,不需要任何特殊指令即可将浮点寄存器中的double存储并重新加载到整数寄存器中.或者执行未对齐的加载,并获取1 int的最后3个字节和下一个的第一个字节. struct只是在内存块顶部构建C的类型系统的一部分,因为内存块很有用.

Think of a struct or class as a block of bytes that might be too big to fit in a register, but which is copied around as a "value". Assembly language doesn't have a type system; bytes in memory are just bytes and it doesn't take any special instructions to store a double from a floating point register and reload it into an integer register. Or to do an unaligned load and get the last 3 bytes of 1 int and the first byte of the next. A struct is just part of building C's type system on top of blocks of memory, since blocks of memory are useful.

这些字节块可以具有静态(全局或static),动态(mallocnew)或自动存储(局部变量:在常规C/C ++实现中为临时在堆栈或寄存器中)在正常的CPU上).无论如何,块内的布局都是相同的(除非编译器为结构局部变量优化了实际内存;请参见下面的示例,内联返回函数的函数).

These blocks of bytes can have static (global or static), dynamic (malloc or new), or automatic storage (local variable: temporary on the stack or in registers, in normal C/C++ implementations on normal CPUs). The layout within a block is the same regardless (unless the compiler optimizes away the actual memory for a struct local variable; see the example below of inlining a function that returns a struct.)

结构或类与任何其他对象相同.在C和C ++术语中,甚至int都是对象: http://en. cppreference.com/w/c/language/object .即可以连续存储的连续字节块(C ++中的非POD类型除外).

A struct or class is the same as any other object. In C and C++ terminology, even an int is an object: http://en.cppreference.com/w/c/language/object. i.e. A contiguous block of bytes that you can memcpy around (except for non-POD types in C++).

要编译的系统的ABI规则指定插入填充的时间和位置,以确保每个成员都具有足够的对齐方式,即使您执行struct { char a; int b; };之类的操作(例如,

The ABI rules for the system you're compiling for specify when and where padding is inserted to make sure each member has sufficient alignment even if you do something like struct { char a; int b; }; (for example, the x86-64 System V ABI, used on Linux and other non-Windows systems specifies that int is a 32-bit type that gets 4-byte alignment in memory. The ABI is what nails down some stuff that the C and C++ standards leave "implementation dependent", so that all compilers for that ABI can make code that can call each other's functions.)

请注意,您可以使用 offsetof(struct_name, member) 来了解有关结构布局的信息(在C11和C ++ 11中).另请参阅C ++ 11中的 alignof 或C11中的_Alignof

Note that you can use offsetof(struct_name, member) to find out about struct layout (in C11 and C++11). See also alignof in C++11, or _Alignof in C11.

由于C规则不允许编译器为您对结构进行排序,因此程序员必须合理地对结构成员进行排序,以避免浪费填充空间. (例如,如果您有一些char成员,则将它们分成至少4个一组,而不是与较宽的成员交替.从大到小的排序是一个简单的规则,请记住,在常见平台上指针可能是64位或32位)

It's up to the programmer to order struct members well to avoid wasting space on padding, since C rules don't let the compiler sort your struct for you. (e.g. if you have some char members, put them in groups of at least 4, rather than alternating with wider members. Sorting from large to small is an easy rule, remembering that pointers may be 64 or 32-bit on common platforms.)

有关ABI的更多详细信息,请参见 https://stackoverflow.com/tags/x86/info. Agner Fog的优秀网站包括ABI指南以及优化指南.

More details of ABIs and so on can be found at https://stackoverflow.com/tags/x86/info. Agner Fog's excellent site includes an ABI guide, along with optimization guides.

class foo {
  int m_a;
  int m_b;
  void inc_a(void){ m_a++; }
  int inc_b(void);
};

int foo::inc_b(void) { return m_b++; }

如您所见,this指针作为隐式第一个参数传递(在SysV AMD64 ABI中的rdi中). m_b从struct/class的开始存储在4个字节处.请注意,巧妙地使用lea来实现后递增运算符,而将旧值保留在eax中.

As you can see, the this pointer is passed as an implicit first argument (in rdi, in the SysV AMD64 ABI). m_b is stored at 4 bytes from the start of the struct/class. Note the clever use of lea to implement the post-increment operator, leaving the old value in eax.

由于在类声明中定义了inc_a的代码,因此不会发出.它与inline非成员函数相同.如果它确实很大,并且编译器决定不对其进行内联,则它可以发出它的独立版本.

No code for inc_a is emitted, since it's defined inside the class declaration. It's treated the same as an inline non-member function. If it was really big and the compiler decided not to inline it, it could emit a stand-alone version of it.

当涉及到虚拟成员函数时,C ++对象与C结构真正不同的地方.该对象的每个副本都必须带有一个额外的指针(指向其实际类型的vtable).

Where C++ objects really differ from C structs is when virtual member functions are involved. Each copy of the object has to carry around an extra pointer (to the vtable for its actual type).

class foo {
  public:
  int m_a;
  int m_b;
  void inc_a(void){ m_a++; }
  void inc_b(void);
  virtual void inc_v(void);
};

void foo::inc_b(void) { m_b++; }

class bar: public foo {
 public:
  virtual void inc_v(void);  // overrides foo::inc_v even for users that access it through a pointer to class foo
};

void foo::inc_v(void) { m_b++; }
void bar::inc_v(void) { m_a++; }

compiles to

  ; This time I made the functions return void, so the asm is simpler
  ; The in-memory layout of the class is now:
  ;   vtable ptr (8B)
  ;   m_a (4B)
  ;   m_b (4B)
foo::inc_v():
    add DWORD PTR [rdi+12], 1   # this_2(D)->m_b,
    ret
bar::inc_v():
    add DWORD PTR [rdi+8], 1    # this_2(D)->D.2657.m_a,
    ret

    # if you uncheck the hide-directives box, you'll see
    .globl  foo::inc_b()
    .set    foo::inc_b(),foo::inc_v()
    # since inc_b has the same definition as foo's inc_v, so gcc saves space by making one an alias for the other.

    # you can also see the directives that define the data that goes in the vtables


有趣的事实:在大多数Intel CPU上,add m32, imm8inc m32快(负载+ ALU运算符的微融合);很少有旧的Pentium4建议避免inc的情况之一仍然适用. gcc始终避免使用inc,即使它可以节省代码大小且没有缺点:/


Fun fact: add m32, imm8 is faster than inc m32 on most Intel CPUs (micro-fusion of the load+ALU uops); one of the rare cases where the old Pentium4 advice to avoid inc still applies. gcc always avoids inc, though, even when it would save code size with no downsides :/ INC instruction vs ADD 1: Does it matter?

void caller(foo *p){
    p->inc_v();
}

    mov     rax, QWORD PTR [rdi]      # p_2(D)->_vptr.foo, p_2(D)->_vptr.foo
    jmp     [QWORD PTR [rax]]         # *_3

(这是优化的尾调用:jmp替换call/ret).

(This is an optimized tailcall: jmp replacing call/ret).

mov将对象的vtable地址加载到寄存器中. jmp是内存间接跳转,即从内存中加载新的RIP值. 跳转目标地址为vtable[0],即vtable中的第一个函数指针.如果存在另一个虚拟函数,则mov不会更改,但jmp会使用.

The mov loads the vtable address from the object into a register. The jmp is a memory-indirect jump, i.e. loading a new RIP value from memory. The jump-target address is vtable[0], i.e. the first function pointer in the vtable. If there was another virtual function, the mov wouldn't change but the jmp would use jmp [rax + 8].

vtable中条目的顺序可能与类中声明的顺序匹配,因此在一个转换单元中对类声明进行重新排序将导致虚函数到达错误的目标.就像重新排列数据成员的顺序会更改类的ABI一样.

The order of entries in the vtable presumably matches the order of declaration in the class, so reordering the class declaration in one translation unit would result in virtual functions going to the wrong target. Just like reordering the data members would change the class's ABI.

如果编译器具有更多信息,则可以取消虚拟化该调用.例如如果可以证明foo *始终指向bar对象,则可以内联bar::inc_v().

If the compiler had more information, it could devirtualize the call. e.g. if it could prove that the foo * was always pointing to a bar object, it could inline bar::inc_v().

GCC可以弄清楚可能的类型是什么,甚至可以虚拟地取消虚拟化.在上面的代码中,编译器看不到任何继承自bar的类,因此可以很好地相信bar*指向的是bar对象,而不是某些派生类.

GCC will even speculatively devirtualize when it can figure out what the type probably is at compile time. In the above code, the compiler can't see any classes that inherit from bar, so it's a good bet that bar* is pointing to a bar object, rather than some derived class.

void caller_bar(bar *p){
    p->inc_v();
}

# gcc5.5 -O3
caller_bar(bar*):
    mov     rax, QWORD PTR [rdi]      # load vtable pointer
    mov     rax, QWORD PTR [rax]      # load target function address
    cmp     rax, OFFSET FLAT:bar::inc_v()  # check it
    jne     .L6       #,
    add     DWORD PTR [rdi+8], 1      # inlined version of bar::inc_v()
    ret
.L6:
    jmp     rax               # otherwise tailcall the derived class's function

请记住,foo *实际上可以指向派生的bar对象,但不允许bar *指向纯foo对象.

Remember, a foo * can actually point to a derived bar object, but a bar * is not allowed to point to a pure foo object.

这不过是个赌注;虚函数的部分意思是,可以扩展类型,而无需重新编译对基本类型进行操作的所有代码.这就是为什么它必须比较函数指针并在错误的情况下回退到间接调用(在这种情况下为jmp tailcall).编译器试探法决定何时尝试.

It is just a bet though; part of the point of virtual functions is that types can be extended without recompiling all the code that operates on the base type. This is why it has to compare the function pointer and fall back to the indirect call (jmp tailcall in this case) if it was wrong. Compiler heuristics decide when to attempt it.

请注意,它正在检查实际的函数指针,而不是比较vtable指针.只要派生类型没有覆盖那个虚拟函数,它仍然可以使用内联的bar::inc_v().覆盖 other 虚拟函数不会影响这一功能,但需要使用不同的vtable.

Notice that it's checking the actual function pointer, rather than comparing the vtable pointer. It can still use the inlined bar::inc_v() as long as the derived type didn't override that virtual function. Overriding other virtual functions wouldn't affect this one, but would require a different vtable.

允许扩展而无需重新编译对于库来说很方便,但这也意味着大程序各部分之间的松散耦合(即,您不必在每个文件中都包含所有头文件).

Allowing extension without recompilation is handy for libraries, but also means looser coupling between parts of a big program (i.e. you don't have to include all the headers in every file).

但这会在某些用途上带来一些效率成本:C ++虚拟分派仅通过对对象的指针起作用,因此,如果没有黑客,就无法拥有多态数组,也不能通过指针数组进行昂贵的间接调用(这使许多硬件和软件优化失败:最简单的,虚拟的,观察者排序的模式在c ++中的最快实现?).

But this imposes some efficiency costs for some uses: C++ virtual dispatch only works through pointers to objects, so you can't have a polymorphic array without hacks, or expensive indirection through an array of pointers (which defeats a lot of hardware and software optimizations: Fastest implementation of simple, virtual, observer-sort of, pattern in c++?).

如果您想要某种多态性/调度,但仅针对一组封闭的类型(即在编译时已知的所有类型),则可以使用多态类型的连续存储

If you want some kind of polymorphism / dispatch but only for a closed set of types (i.e. all known at compile time), you can do it manually with a union + enum + switch, or with std::variant<D1,D2> to make a union and std::visit to dispatch, or various other ways. See also Contiguous storage of polymorphic types and Fastest implementation of simple, virtual, observer-sort of, pattern in c++?.

使用struct不会强制编译器将内容实际放入内存中,这比小型数组或指向局部变量的指针要强.例如,按值返回struct的内联函数仍可以完全优化.

Using a struct doesn't force the compiler to actually put stuff in memory, any more than a small array or a pointer to a local variable does. For example, an inline function that returns a struct by value can still fully optimize.

适用规则:即使结构在逻辑上具有一定的内存存储,编译器也可以创建asm将所有所需的成员保留在寄存器中(并进行转换,这意味着将值存储在寄存器中)不能与运行"源代码的C ++抽象机中的变量或临时变量的任何值相对应.)

The as-if rule applies: even if a struct logically has some memory storage, the compiler can make asm that keeps all the needed members in registers (and do transformations that mean that values in registers don't correspond to any value of a variable or temporary in the C++ abstract machine "running" the source code).

struct pair {
  int m_a;
  int m_b;
};

pair addsub(int a, int b) {
  return {a+b, a-b};
}

int foo(int a, int b) {
  pair ab = addsub(a,b);
  return ab.m_a * ab.m_b;
}

那个请注意,即使按值返回结构也不一定将其存储在内存中. x86-64 SysV ABI传递并返回打包到寄存器中的小型结构.不同的ABI为此做出不同的选择.

Notice how even returning a struct by value doesn't necessarily put it in memory. The x86-64 SysV ABI passes and returns small structs packed together into registers. Different ABIs make different choices for this.

这篇关于对象在装配级的x86中如何工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆