从编译器的角度来看,如何处理数组引用,以及为什么不允许按值传递(不衰减)? [英] From compiler perspective, how is reference for array dealt with, and, why passing by value(not decay) is not allowed?

查看:95
本文介绍了从编译器的角度来看,如何处理数组引用,以及为什么不允许按值传递(不衰减)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们知道,在C ++中,我们可以将数组的引用作为参数传递给f(int (&[N]).是的,它是iso标准所保证的语法,但是我对编译器在这里的工作方式感到好奇.我发现了这个线程,但是不幸的是,这并不能回答我的问题-这种语法是由编译器实现的吗?

As we know, in C++, we can pass an array's reference as an argument like f(int (&[N]). Yes, it is syntax guaranteed by the iso standard, but I am curious about how the compiler works here. I found this thread, but unfortunately, this doesn't answer my question -- How is this syntax implemented by the compiler?

然后我编写了一个演示,希望从汇编语言中看到一些东西:

I then wrote a demo and hoped to see something from the assembly language:

void foo_p(int*arr) {}
void foo_r(int(&arr)[3]) {}
template<int length>
void foo_t(int(&arr)[length]) {}
int main(int argc, char** argv)
{
    int arr[] = {1, 2, 3};
    foo_p(arr);
    foo_r(arr);
    foo_t(arr);
   return 0;
}

最初,我猜测它仍然会衰减到指针,但会通过寄存器隐式传递长度,然后返回到函数体中的数组.但是汇编代码告诉我这不是真的

Originally, I guess it will still decay to the pointer, but will pass length implicitly via a register, then turn back into an array in the function body. But the assembly code tells me this is not true

void foo_t<3>(int (&) [3]):
  push rbp #4.31
  mov rbp, rsp #4.31
  sub rsp, 16 #4.31
  mov QWORD PTR [-16+rbp], rdi #4.31
  leave #4.32
  ret #4.32

foo_p(int*):
  push rbp #1.21
  mov rbp, rsp #1.21
  sub rsp, 16 #1.21
  mov QWORD PTR [-16+rbp], rdi #1.21
  leave #1.22
  ret #1.22

foo_r(int (&) [3]):
  push rbp #2.26
  mov rbp, rsp #2.26
  sub rsp, 16 #2.26
  mov QWORD PTR [-16+rbp], rdi #2.26
  leave #2.27
  ret #2.27

main:
  push rbp #6.1
  mov rbp, rsp #6.1
  sub rsp, 32 #6.1
  mov DWORD PTR [-16+rbp], edi #6.1
  mov QWORD PTR [-8+rbp], rsi #6.1
  lea rax, QWORD PTR [-32+rbp] #7.15
  mov DWORD PTR [rax], 1 #7.15
  lea rax, QWORD PTR [-32+rbp] #7.15
  add rax, 4 #7.15
  mov DWORD PTR [rax], 2 #7.15
  lea rax, QWORD PTR [-32+rbp] #7.15
  add rax, 8 #7.15
  mov DWORD PTR [rax], 3 #7.15
  lea rax, QWORD PTR [-32+rbp] #8.5
  mov rdi, rax #8.5
  call foo_p(int*) #8.5
  lea rax, QWORD PTR [-32+rbp] #9.5
  mov rdi, rax #9.5
  call foo_r(int (&) [3]) #9.5
  lea rax, QWORD PTR [-32+rbp] #10.5
  mov rdi, rax #10.5
  call void foo_t<3>(int (&) [3]) #10.5
  mov eax, 0 #11.11
  leave #11.11
  ret #11.11

> live demo

我承认我不熟悉汇编语言,但是很显然,这三个函数的汇编代码是相同的!因此,在汇编代码之前必须发生一些事情.无论如何,与数组不同,指针对长度一无所知,对吗?

I admit that I am not familiar with the assembly language, but clearly, the three function's assembly codes are the same! So, something must happen before the assembler codes. Anyway, unlike the array, the pointer knows nothing about the length, right?

  1. 编译器在这里如何工作?
  2. 现在该标准允许通过引用传递数组,这是否意味着实现微不足道?如果是这样,为什么不允许按值传递?


对于第二季度,我的猜测是前C ++和C代码的复杂性.毕竟,在功能参数中int[]等于int*一直是一种传统.也许一百年后,它会被弃用吗?


For Q2, my guess is for the complexity of the former C++ and C codes. After all, int[] being equal to int* in function parameters has been a tradition. Maybe one hundred years later, it will be deprecated?

推荐答案

以汇编语言对数组的C ++引用与对第一个元素的指针相同.

即使C99 int foo(int arr[static 3])仍然只是asm中的指针. static语法向编译器保证它可以安全地读取即使C抽象机不访问某些元素,所有这3个元素也是如此,例如,它可以对if使用无分支的cmov.

A C++ reference to an array is the same as a pointer to the first element, in assembly language.

Even C99 int foo(int arr[static 3]) is still just a pointer in asm. The static syntax guarantees to the compiler that it can safely read all 3 elements even if the C abstract machine doesn't access some elements, so for example it could use a branchless cmov for an if.

调用者没有在寄存器中传递长度,因为它是编译时常量,因此在运行时不需要.

您可以按值传递数组,但前提是它们位于struct或union中.在这种情况下,不同的调用约定具有不同的规则. 是哪种C11数据类型是根据AMD64 ABI的数组.

You can pass arrays by value, but only if they're inside a struct or union. In that case, different calling conventions have different rules. What kind of C11 data type is an array according to the AMD64 ABI.

您几乎永远不会想要通过值传递数组,因此C没有语法,C ++也从未发明任何语法.传递常量引用(即const int *arr)的效率要高得多;只是一个指针arg.

You'd almost never want to pass an array by value, so it makes sense that C doesn't have syntax for it, and that C++ never invented any either. Passing by constant reference (i.e. const int *arr) is far more efficient; just a single pointer arg.

我将您的代码放在了用gcc -O3 -fno-inline-functions -fno-inline-functions-called-once -fno-inline-small-functions编译的Godbolt编译器浏览器上,以防止其内联函数调用.这消除了-O0 debug-build和frame-pointer样板中的所有噪音. (我只是在手册页中搜索inline并禁用了内联选项,直到获得所需的内容为止.)

I put your code on the Godbolt compiler explorer, compiled with gcc -O3 -fno-inline-functions -fno-inline-functions-called-once -fno-inline-small-functions to stop it from inlining the function calls. That gets rid of all the noise from -O0 debug-build and frame-pointer boilerplate. (I just searched the man page for inline and disabled inlining options until I got what I wanted.)

代替-fno-inline-small-functions等,您可以在函数定义上使用GNU C __attribute__((noinline))来禁用特定函数的内联,即使它们是static.

Instead of -fno-inline-small-functions and so on, you could use GNU C __attribute__((noinline)) on your function definitions to disable inlining for specific functions, even if they're static.

我还添加了对没有定义的函数的调用,因此编译器需要在内存中具有正确值的arr[],并在其中两个函数中为arr[4]添加了存储.这可以让我们测试编译器是否警告不要超出数组范围.

I also added a call to a function without a definition, so the compiler needs to have arr[] with the right values in memory, and added a store to arr[4] in two of the functions. This lets us test whether the compiler warns about going outside the array bounds.

__attribute__((noinline, noclone)) 
void foo_p(int*arr) {(void)arr;}
void foo_r(int(&arr)[3]) {arr[4] = 41;}

template<int length>
void foo_t(int(&arr)[length]) {arr[4] = 42;}

void usearg(int*); // stop main from optimizing away arr[] if foo_... inline

int main()
{
    int arr[] = {1, 2, 3};
    foo_p(arr);
    foo_r(arr);
    foo_t(arr);
    usearg(arr);
   return 0;
}

gcc7.3 -O3 -Wall -Wextra without function inlining, on Godbolt: Since I silenced the unused-args warnings from your code, the only warning we get is from the template, not from foo_r:

<source>: In function 'int main()':
<source>:14:10: warning: array subscript is above array bounds [-Warray-bounds]
     foo_t(arr);
     ~~~~~^~~~~

asm输出为:

void foo_t<3>(int (&) [3]) [clone .isra.0]:
    mov     DWORD PTR [rdi], 42       # *ISRA.3_4(D),
    ret
foo_p(int*):
    rep ret
foo_r(int (&) [3]):
    mov     DWORD PTR [rdi+16], 41    # *arr_2(D),
    ret

main:
    sub     rsp, 24             # reserve space for the array and align the stack for calls
    movabs  rax, 8589934593     # this is 0x200000001: the first 2 elems
    lea     rdi, [rsp+4]
    mov     QWORD PTR [rsp+4], rax    # MEM[(int *)&arr],  first 2 elements
    mov     DWORD PTR [rsp+12], 3     # MEM[(int *)&arr + 8B],  3rd element as an imm32
    call    foo_r(int (&) [3])
    lea     rdi, [rsp+20]
    call    void foo_t<3>(int (&) [3]) [clone .isra.0]    #
    lea     rdi, [rsp+4]      # tmp97,
    call    usearg(int*)     #
    xor     eax, eax  #
    add     rsp, 24   #,
    ret

foo_p()的调用仍然得到优化,可能是因为它没有执行任何操作. (我没有禁用过程间优化,甚至noinlinenoclone属性也没有阻止它.)将*arr=0;添加到函数主体会导致从main对其进行调用(传递a指针rdi中的指针,就像其他2).

The call to foo_p() still got optimized away, probably because it doesn't do anything. (I didn't disable inter-procedural optimization, and even the noinline and noclone attributes didn't stop that.) Adding *arr=0; to the function body results in a call to it from main (passing a pointer in rdi just like the other 2).

请注意在已拆线的函数名称上的clone .isra.0注释:gcc定义了函数,该函数采用指向arr[4]而不是基础元素的指针.这就是为什么要使用lea rdi, [rsp+20]设置arg的原因,以及为什么商店使用[rdi]来无偏移地解引用该点的原因. __attribute__((noclone))会阻止这种情况.

Notice the clone .isra.0 annotation on the demangled function name: gcc made a definition of the function that takes a pointer to arr[4] rather than to the base element. That's why there's a lea rdi, [rsp+20] to set up the arg, and why the store uses [rdi] to deref the point with no displacement. __attribute__((noclone)) would stop that.

这种过程间优化非常简单,在这种情况下节省了1个字节的代码大小(在克隆的寻址模式下仅disp8),但是在其他情况下可能会有用.调用者需要知道它是该函数的修改版本的定义,例如void foo_clone(int *p) { *p = 42; },这就是为什么它需要在整齐的符号名称中对其进行编码的原因.

This inter-procedural optimization is pretty much trivial and saves 1 byte of code size in this case (just the disp8 in the addressing mode in the clone), but can be useful in other cases. The caller needs to know that its a definition for a modified version of the function, like void foo_clone(int *p) { *p = 42; }, which is why it needs to encode that in the mangled symbol name.

如果您在一个文件中实例化了模板,并从另一个看不到定义的文件中调用了模板,那么如果不进行链接时优化,gcc只能调用常规名称并将指针传递给数组,例如编写的功能.

If you'd instantiated the template in one file and called it from another file that couldn't see the definition, then without link-time optimization gcc would have to just call the regular name and pass a pointer to the array like the function as written.

IDK为什么gcc会对模板执行此操作,而不对引用执行此操作.它可能与它警告有关模板版本而不是参考版本的事实有关.还是与main推导模板有关?

IDK why gcc does this for the template but not the reference. It might be related to the fact it warns about the template version, but not the reference version. Or maybe it's related to main deducing the template?

顺便说一句,实际上会使它运行得更快一些的IPO将是让main使用mov rdi, rsp而不是lea rdi, [rsp+4].即以&arr[-1]作为函数arg,因此克隆将使用mov dword ptr [rdi+20], 42.

BTW, an IPO that would actually make it run slightly faster would be to let main use mov rdi, rsp instead of lea rdi, [rsp+4]. i.e. take &arr[-1] as the function arg, so the clone would use mov dword ptr [rdi+20], 42.

但是这仅对像main这样的调用方有用,这些调用方在rsp上方分配了4个字节的数组,我认为gcc只是在寻找使函数本身更高效的IPO,而不是在一个特定调用方中的调用顺序

But that's only helpful for callers like main that have allocated an array 4 bytes above rsp, and I think gcc is only looking for IPOs that make the function itself more efficient, not the calling sequence in one specific caller.

这篇关于从编译器的角度来看,如何处理数组引用,以及为什么不允许按值传递(不衰减)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆