结构和联合:从性能的角度来看哪个更好?通过值或指针传递参数? [英] Structs and unions: which is better from a performance point of view? Passing the parameter by value or pointer?

查看:39
本文介绍了结构和联合:从性能的角度来看哪个更好?通过值或指针传递参数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这可能是一个愚蠢的问题,但是每当我想优化"大量参数(例如结构)传递给只读取它们的函数时,都会使我有些跷.我在传递指针之间犹豫不决:

It's probably a silly question, but it makes me slightly quibble every time I want to "optimize" the passage of heavy arguments (such as structure for example) to a function that just reads them. I hesitate between passing a pointer:

struct Foo
{
    int x;
    int y;
    int z;
} Foo;

int sum(struct Foo *foo_struct)
{
    return foo_struct->x + foo_struct->y + foo_struct->z;
}

或常量:

struct Foo
{
    int x;
    int y;
    int z;
} Foo;

int sum(const struct Foo foo_struct)
{
    return foo_struct.x + foo_struct.y + foo_struct.z;
}

指针的目的不是复制数据,而只是发送其地址,这几乎不花任何费用.

The pointers are intended not to copy the data but just to send its address, which costs almost nothing.

对于常量,尽管我不知道如何优化常量遍,但它可能在编译器或优化级别之间有所不同.如果是这样,那么编译器可能会比我做得更好.

For constants, it probably varies between compilers or optimization levels, although I don't know how a constant pass is optimized; if it is, then the compiler probably does a better job than I do.

仅从性能角度来看(即使在我的示例中可以忽略不计),首选的处理方式是什么?

From a performance point of view only (even if it is negligible in my examples), what is the preferred way of doing things?

推荐答案

结构很像数组,是数据的容器.每次使用容器时,都会将其数据布置在连续的内存块中.容器本身由其起始地址标识,并且每次使用容器进行操作时,程序都需要通过专用指令进行低级指针算术运算,以便应用偏移量从起始地址到达所需字段(或数组中的元素).编译器唯一需要了解的用于结构的东西是(大致):

Structs, much like arrays, are containers of data. Every time you work with a container, you will have its data layed out in a contiguous block of memory. The container itself is identified by its starting address, and every single time you operate with it, your program will need to do low level pointer arithmetic through dedicated instructions in order to apply an offset to get from the starting address to the desired field (or element in case of arrays). The only things that a compiler needs to know to work with a struct are (roughly):

  1. 它的起始地址在内存中.
  2. 每个字段的偏移量.
  3. 每个字段的大小.

无论结构是否作为指针传递,编译器都可以以相同的方式优化在结构上运行的代码,稍后我们将介绍如何做.不过有什么不同,它是 如何 将该结构传递给每个函数的方法.

A compiler can optimize code working on structs in the same way if the struct is passed as pointer or not, and we'll see how in a moment. What's different though, it's how the struct is passed to each function.

首先让我澄清一件事: const 限定符对于理解将结构作为指针或按值传递之间的区别没有用.它只是告诉编译器,在函数内部,参数本身的值将不会被修改. const 通常不影响作为值或作为指针传递之间的性能差异. const 关键字仅对其他类型的优化有用,而对这一优化则无效.

First let me make one thing clear: the const qualifier is not useful to understand the difference between passing a structure as pointer or by value. It merely tells the compiler that inside the function the value of the parameter itself will not be modified. Performance difference between passing as value or as pointer is not affected in general by const. The const keyword only becomes useful for other kinds of optimizations, not this one.

这两个签名之间的主要区别:

The main difference between these two signatures:

void first(const struct mystruct x);
void second(struct mystruct *x);

是第一个函数期望整个结构作为参数传递,因此,这意味着在调用该函数之前将整个结构复制到堆栈上.但是,第二个函数只需要指向该结构的指针,因此,该参数可以作为单个值传递到堆栈中,也可以像在x86-64中通常那样在寄存器中传递.

is that the first function will expect the whole struct to be passed as parameter, which therefore means copying the whole structure on the stack right before calling the function. The second function however only needs a pointer to the structure, and therefore the argument can be passed as a single value on the stack, or in a register like it's usually done in x86-64.

现在,为了更好地了解会发生什么,让我们分析以下程序:

Now, to better understand what happens, let's analyze the following program:

#include <stdio.h>

struct mystruct {
    unsigned a, b, c, d, e, f, g, h, i, j, k;
};

unsigned long __attribute__ ((noinline)) first(const struct mystruct x) {
    unsigned long total = x.a;
    total += x.b;
    total += x.c;
    total += x.d;
    total += x.e;
    total += x.f;
    total += x.g;
    total += x.h;
    total += x.i;
    total += x.j;
    total += x.k;

    return total;
}

unsigned long __attribute__ ((noinline)) second(struct mystruct *x) {
    unsigned long total = x->a;
    total += x->b;
    total += x->c;
    total += x->d;
    total += x->e;
    total += x->f;
    total += x->g;
    total += x->h;
    total += x->i;
    total += x->j;
    total += x->k;

    return total;
}

int main (void) {
    struct mystruct x = {0};
    scanf("%u", &x.a);

    unsigned long v = first(x);
    printf("%lu\n", v);

    v = second(&x);
    printf("%lu\n", v);

    return 0;
}

__ attribute__((noinline))只是为了避免自动内联该函数,出于测试目的,该函数非常简单,因此可能会内联 -O3

The __attribute__ ((noinline)) is just to avoid automatic inlining of the function, which for testing purposes is very simple and therefore will probably get inlined with -O3.

现在让我们借助 objdump 编译和反汇编结果.

Let's now compile and disassemble the result with the help of objdump.

我们首先进行无优化的编译,看看会发生什么:

Let's first compile without optimizations and see what happens:

  1. 这是 main()调用 first()的方式:

  86a:   48 89 e0                mov    rax,rsp
  86d:   48 8b 55 c0             mov    rdx,QWORD PTR [rbp-0x40]
  871:   48 89 10                mov    QWORD PTR [rax],rdx
  874:   48 8b 55 c8             mov    rdx,QWORD PTR [rbp-0x38]
  878:   48 89 50 08             mov    QWORD PTR [rax+0x8],rdx
  87c:   48 8b 55 d0             mov    rdx,QWORD PTR [rbp-0x30]
  880:   48 89 50 10             mov    QWORD PTR [rax+0x10],rdx
  884:   48 8b 55 d8             mov    rdx,QWORD PTR [rbp-0x28]
  888:   48 89 50 18             mov    QWORD PTR [rax+0x18],rdx
  88c:   48 8b 55 e0             mov    rdx,QWORD PTR [rbp-0x20]
  890:   48 89 50 20             mov    QWORD PTR [rax+0x20],rdx
  894:   8b 55 e8                mov    edx,DWORD PTR [rbp-0x18]
  897:   89 50 28                mov    DWORD PTR [rax+0x28],edx
  89a:   e8 81 fe ff ff          call   720 <first>

这是函数本身:

 0000000000000720 <first>:
  720:   55                      push   rbp
  721:   48 89 e5                mov    rbp,rsp
  724:   8b 45 10                mov    eax,DWORD PTR [rbp+0x10]
  727:   89 c0                   mov    eax,eax
  729:   48 89 45 f8             mov    QWORD PTR [rbp-0x8],rax
  72d:   8b 45 14                mov    eax,DWORD PTR [rbp+0x14]
  730:   89 c0                   mov    eax,eax
  732:   48 01 45 f8             add    QWORD PTR [rbp-0x8],rax
  736:   8b 45 18                mov    eax,DWORD PTR [rbp+0x18]
  739:   89 c0                   mov    eax,eax
  ... same stuff happening over and over ...
  783:   48 01 45 f8             add    QWORD PTR [rbp-0x8],rax
  787:   48 8b 45 f8             mov    rax,QWORD PTR [rbp-0x8]
  78b:   5d                      pop    rbp
  78c:   c3                      ret

很明显,在调用该函数之前,整个结构都已复制到堆栈上.

然后,该函数将结构中的每个值每次都查看堆栈中该结构中包含的每个值( DWORD PTR [rbp + offset] ).

The function then takes each value in the struct looking at each value contained in the struct on the stack each time (DWORD PTR [rbp + offset]).

这是 main()调用 second()的方式:

  8bf:   48 8d 45 c0             lea    rax,[rbp-0x40]
  8c3:   48 89 c7                mov    rdi,rax
  8c6:   e8 c2 fe ff ff          call   78d <second>

这是函数本身:

 000000000000078d <second>:
  78d:   55                      push   rbp
  78e:   48 89 e5                mov    rbp,rsp
  791:   48 89 7d e8             mov    QWORD PTR [rbp-0x18],rdi
  795:   48 8b 45 e8             mov    rax,QWORD PTR [rbp-0x18]
  799:   8b 00                   mov    eax,DWORD PTR [rax]
  79b:   89 c0                   mov    eax,eax
  79d:   48 89 45 f8             mov    QWORD PTR [rbp-0x8],rax
  7a1:   48 8b 45 e8             mov    rax,QWORD PTR [rbp-0x18]
  7a5:   8b 40 04                mov    eax,DWORD PTR [rax+0x4]
  7a8:   89 c0                   mov    eax,eax
  ... same stuff happening over and over ...
  81f:   48 01 45 f8             add    QWORD PTR [rbp-0x8],rax
  823:   48 8b 45 f8             mov    rax,QWORD PTR [rbp-0x8]
  827:   5d                      pop    rbp
  828:   c3                      ret

您可以看到参数是作为指针传递的,而不是被复制到堆栈上的,这只是两个非常简单的指令( lea + mov ).但是,因为现在函数必须使用-> 运算符与指针一起使用,所以我们看到每一次需要访问结构中的值,内存需要被取消引用两次 而不是一次(首先是从堆栈中获取指向结构的指针,然后是在结构中指定偏移量处获取值).

You can see that the argument is passed as a pointer instead of being copied on the stack, which is only two very simple instructions (lea + mov). However, since now the function has to work with a pointer using the -> operator, we see that every single time a value in the struct needs to be accessed, memory needs to be dereferenced two times instead of one (first to get the pointer to the structure from the stack, then to get the value at the specified offset in the struct).

似乎 这两个函数之间没有真正的区别,因为将结构加载到堆栈中的堆栈所需的线性指令数量(就结构成员而言是线性的)在第二种情况下,仍然需要在第一种情况下再次取消引用指针.

It may seem that there is no real difference between the two functions, since the linear number of instructions (linear in terms of struct members) that was required to load the struct on the stack in the first case is still required to dereference the pointer another time in the second case.

尽管我们在谈论优化,但是不优化代码是没有意义的.让我们看看如果这样做会发生什么.

We are talking about optimization though, and it makes no sense to not optimize the code. Let's see what happens if we do.

实际上,当使用 struct 时,我们实际上并不关心它在内存中的位置(堆栈,堆,数据段等).只要我们知道它从哪里开始,就可以归结为应用相同的简单指针算法来访问字段.不论结构位于何处或是否已动态分配,都必须始终执行 .

In reality, when working with a struct, we don't really care where it is in memory (stack, heap, data segment, whatever). As long as we know where it starts, it all boils down to applying the same simple pointer arithmetic to access the fields. This always needs to be done, regardless of where the structure resides or whether it was dynamically allocated or not.

如果我们使用 -O3 优化上面的代码,我们现在将看到以下内容:

If we optimize the code above with -O3, we now see the following:

  1. 这是 main()调用 first()的方式:

  61a:   48 83 ec 30             sub    rsp,0x30
  61e:   48 8b 44 24 30          mov    rax,QWORD PTR [rsp+0x30]
  623:   48 89 04 24             mov    QWORD PTR [rsp],rax
  627:   48 8b 44 24 38          mov    rax,QWORD PTR [rsp+0x38]
  62c:   48 89 44 24 08          mov    QWORD PTR [rsp+0x8],rax
  631:   48 8b 44 24 40          mov    rax,QWORD PTR [rsp+0x40]
  636:   48 89 44 24 10          mov    QWORD PTR [rsp+0x10],rax
  63b:   48 8b 44 24 48          mov    rax,QWORD PTR [rsp+0x48]
  640:   48 89 44 24 18          mov    QWORD PTR [rsp+0x18],rax
  645:   48 8b 44 24 50          mov    rax,QWORD PTR [rsp+0x50]
  64a:   48 89 44 24 20          mov    QWORD PTR [rsp+0x20],rax
  64f:   8b 44 24 58             mov    eax,DWORD PTR [rsp+0x58]
  653:   89 44 24 28             mov    DWORD PTR [rsp+0x28],eax
  657:   e8 74 01 00 00          call   7d0 <first>

这是函数本身:

 00000000000007d0 <first>:
  7d0:   8b 44 24 0c             mov    eax,DWORD PTR [rsp+0xc]
  7d4:   8b 54 24 08             mov    edx,DWORD PTR [rsp+0x8]
  7d8:   48 01 c2                add    rdx,rax
  7db:   8b 44 24 10             mov    eax,DWORD PTR [rsp+0x10]
  7df:   48 01 d0                add    rax,rdx
  7e2:   8b 54 24 14             mov    edx,DWORD PTR [rsp+0x14]
  7e6:   48 01 d0                add    rax,rdx
  7e9:   8b 54 24 18             mov    edx,DWORD PTR [rsp+0x18]
  7ed:   48 01 c2                add    rdx,rax
  7f0:   8b 44 24 1c             mov    eax,DWORD PTR [rsp+0x1c]
  7f4:   48 01 c2                add    rdx,rax
  7f7:   8b 44 24 20             mov    eax,DWORD PTR [rsp+0x20]
  7fb:   48 01 d0                add    rax,rdx
  7fe:   8b 54 24 24             mov    edx,DWORD PTR [rsp+0x24]
  802:   48 01 d0                add    rax,rdx
  805:   8b 54 24 28             mov    edx,DWORD PTR [rsp+0x28]
  809:   48 01 c2                add    rdx,rax
  80c:   8b 44 24 2c             mov    eax,DWORD PTR [rsp+0x2c]
  810:   48 01 c2                add    rdx,rax
  813:   8b 44 24 30             mov    eax,DWORD PTR [rsp+0x30]
  817:   48 01 d0                add    rax,rdx
  81a:   c3                      ret

  • 这是 main()调用 second()的方式:

      671:   48 89 df                mov    rdi,rbx
      674:   e8 a7 01 00 00          call   820 <second>
    

    这是函数本身:

     0000000000000820 <second>:
      820:   8b 47 04                mov    eax,DWORD PTR [rdi+0x4]
      823:   8b 17                   mov    edx,DWORD PTR [rdi]
      825:   48 01 c2                add    rdx,rax
      828:   8b 47 08                mov    eax,DWORD PTR [rdi+0x8]
      82b:   48 01 d0                add    rax,rdx
      82e:   8b 57 0c                mov    edx,DWORD PTR [rdi+0xc]
      831:   48 01 d0                add    rax,rdx
      834:   8b 57 10                mov    edx,DWORD PTR [rdi+0x10]
      837:   48 01 c2                add    rdx,rax
      83a:   8b 47 14                mov    eax,DWORD PTR [rdi+0x14]
      83d:   48 01 c2                add    rdx,rax
      840:   8b 47 18                mov    eax,DWORD PTR [rdi+0x18]
      843:   48 01 d0                add    rax,rdx
      846:   8b 57 1c                mov    edx,DWORD PTR [rdi+0x1c]
      849:   48 01 d0                add    rax,rdx
      84c:   8b 57 20                mov    edx,DWORD PTR [rdi+0x20]
      84f:   48 01 c2                add    rdx,rax
      852:   8b 47 24                mov    eax,DWORD PTR [rdi+0x24]
      855:   48 01 c2                add    rdx,rax
      858:   8b 47 28                mov    eax,DWORD PTR [rdi+0x28]
      85b:   48 01 d0                add    rax,rdx
      85e:   c3                      ret
    

  • 现在应该清楚哪个代码更好.编译器成功地确定,在两种情况下,所需要做的就是知道结构的开始位置,然后可以使用相同的简单数学来确定每个字段的位置.地址是在堆栈上还是在其他地方,都没有关系.

    It should now be clear which code is better. The compiler successfully identified that all it needs in both cases is to know where the beginning of the structure is, and then it can just apply the same simple math to determine the position of each field. Whether the address is on the stack or somewhere else, it does not really matter.

    实际上,在 first()情况下,我们看到所有字段都是通过 [rsp + offset] 访问的,这意味着堆栈本身上的某些地址( rsp )用于计算字段的位置,而在 second()情况下,我们看到 [rdi + offset] ,这意味着地址而是使用作为参数传递的(在 rdi 中).偏移量仍然相同.

    In fact, in the first() case we see all fields being accessed through [rsp + offset], meaning that some address on the stack itself (rsp) is used to calculate the position of the fields, while in the second() case we see [rdi + offset], meaning that the address passed as parameter (in rdi) is used instead. The offsets though are still the same.

    那么这两个函数之间有什么区别?就功能代码本身而言,基本上没有.在参数传递方面, first()函数仍然需要按值传递结构,因此即使启用了优化,整个结构仍需要复制到堆栈上,因此我们可以看到 first()函数很重,并且在调用方中添加了很多代码.

    So what's the difference now between the two functions? In terms of function code itself, basically none. In terms of parameter passing, the first() function still needs the struct passed by value, and therefore even with optimizations enabled, the whole structure still needs to be copied on the stack, therefore we can see that the first() function is way heavier and adds a lot of code in the caller.

    正如我之前说过的,如果该结构体是否作为指针传递,则编译器可以以相同的方式优化在该结构体上工作的代码.但是,正如我们刚刚看到的那样,结构的传递方式在调用方中产生了 big 差异.

    As I previously said, a compiler can optimize code working on structs in the same way if the struct is passed as pointer or not. However, as we just saw, the way the structure is passed makes a big difference in the caller.

    有人认为, first()函数的 const 限定词可能会给编译器敲响警钟,并使他们理解,实际上并不需要复制数据在栈上,调用者只需传递一个指针即可.但是,编译器应严格遵守ABI针对给定签名所规定的调用约定,而不是尽力优化代码.毕竟,在这种情况下,这实际上不是编译器的错误,而是程序员的错误.

    One could argue that the const qualifier for the first() function could ring a bell for the compiler and make it understand that there is really no need to copy the data on the stack, and the caller could just pass a pointer. However the compiler should strictly adhere to the calling convention dictated by the ABI for a given signature, instead of going out of its way to optimize the code. After all, it's not really the compiler's fault in this case, but the programmer's fault.

    因此,回答您的问题:

    仅从性能角度来看(即使在我的示例中可以忽略不计),首选的处理方式是什么?

    From a performance point of view only (even if it is negligible in my examples), what is the preferred way of doing things?

    首选方式肯定是传递指针,而不是 struct 本身.

    The preferred way is definitely to pass a pointer, and not the struct itself.

    这篇关于结构和联合:从性能的角度来看哪个更好?通过值或指针传递参数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆