的可变参数函数的内联 [英] Inlining of vararg functions

查看:157
本文介绍了的可变参数函数的内联的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在播放有关与优化设置,我注意到一个有趣的现象:函数采取可变数量的参数( ... )似乎从来没有得到内联。 (显然,这种行为是编译器特定的,但我已经上了几个不同的系统进行测试。)

While playing about with optimisation settings, I noticed an interesting phenomenon: functions taking a variable number of arguments (...) never seemed to get inlined. (Obviously this behavior is compiler-specific, but I've tested on a couple of different systems.)

例如,编译以下的小PROGRAMM:

For example, compiling the following small programm:

#include <stdarg.h>
#include <stdio.h>

static inline void test(const char *format, ...)
{
  va_list ap;
  va_start(ap, format);
  vprintf(format, ap);
  va_end(ap);
}

int main()
{
  test("Hello %s\n", "world");
  return 0;
}

将看似总是导致一个(可能错位)测试符号生成的可执行文件中出现(在C和C锵和GCC测试在MacOS和Linux ++模式) 。如果一个人修改的签名测试()来取被送到的printf(),功能的普通字符串从 -O1 内联通过向上两种编译器为你所期望的。

will seemingly always result in a (possibly mangled) test symbol appearing in the resulting executable (tested with Clang and GCC in both C and C++ modes on MacOS and Linux). If one modifies the signature of test() to take a plain string which is passed to printf(), the function is inlined from -O1 upwards by both compilers as you'd expect.

我怀疑这是用来实现可变参数的巫术的事,但究竟这通常是一个谜给我。任何人都可以赐教为编译器通常如何实现可变参数的功能,为什么这个看似prevents内联?

I suspect this is to do with the voodoo magic used to implement varargs, but how exactly this is usually done is a mystery to me. Can anybody enlighten me as to how compilers typically implement vararg functions, and why this seemingly prevents inlining?

推荐答案

至少在X86-64,var_args的传球相当复杂(由于传递参数的寄存器)。其它架构可能并不相当如此复杂的,但它是很少微不足道。特别地,具有一个堆栈帧或帧指针指当得到可要求各参数。这些排序规则可能从内联函数停止编译器。

At least on x86-64, the passing of var_args is quite complex (due to passing arguments in registers). Other architectures may not be quite so complex, but it is rarely trivial. In particular, having a stack-frame or frame pointer to refer to when getting each argument may be required. These sort of rules may well stop the compiler from inlining the function.

在code为X86-64包括推动所有的整数参数,8 SSE寄存器到堆栈中。

The code for x86-64 includes pushing all the integer arguments, and 8 sse registers onto the stack.

这是从锵编译原来的code中的功能:

This is the function from the original code compiled with Clang:

test:                                   # @test
    subq    $200, %rsp
    testb   %al, %al
    je  .LBB1_2
# BB#1:                                 # %entry
    movaps  %xmm0, 48(%rsp)
    movaps  %xmm1, 64(%rsp)
    movaps  %xmm2, 80(%rsp)
    movaps  %xmm3, 96(%rsp)
    movaps  %xmm4, 112(%rsp)
    movaps  %xmm5, 128(%rsp)
    movaps  %xmm6, 144(%rsp)
    movaps  %xmm7, 160(%rsp)
.LBB1_2:                                # %entry
    movq    %r9, 40(%rsp)
    movq    %r8, 32(%rsp)
    movq    %rcx, 24(%rsp)
    movq    %rdx, 16(%rsp)
    movq    %rsi, 8(%rsp)
    leaq    (%rsp), %rax
    movq    %rax, 192(%rsp)
    leaq    208(%rsp), %rax
    movq    %rax, 184(%rsp)
    movl    $48, 180(%rsp)
    movl    $8, 176(%rsp)
    movq    stdout(%rip), %rdi
    leaq    176(%rsp), %rdx
    movl    $.L.str, %esi
    callq   vfprintf
    addq    $200, %rsp
    retq

和来自海湾合作委员会:

and from gcc:

test.constprop.0:
    .cfi_startproc
    subq    $216, %rsp
    .cfi_def_cfa_offset 224
    testb   %al, %al
    movq    %rsi, 40(%rsp)
    movq    %rdx, 48(%rsp)
    movq    %rcx, 56(%rsp)
    movq    %r8, 64(%rsp)
    movq    %r9, 72(%rsp)
    je  .L2
    movaps  %xmm0, 80(%rsp)
    movaps  %xmm1, 96(%rsp)
    movaps  %xmm2, 112(%rsp)
    movaps  %xmm3, 128(%rsp)
    movaps  %xmm4, 144(%rsp)
    movaps  %xmm5, 160(%rsp)
    movaps  %xmm6, 176(%rsp)
    movaps  %xmm7, 192(%rsp)
.L2:
    leaq    224(%rsp), %rax
    leaq    8(%rsp), %rdx
    movl    $.LC0, %esi
    movq    stdout(%rip), %rdi
    movq    %rax, 16(%rsp)
    leaq    32(%rsp), %rax
    movl    $8, 8(%rsp)
    movl    $48, 12(%rsp)
    movq    %rax, 24(%rsp)
    call    vfprintf
    addq    $216, %rsp
    .cfi_def_cfa_offset 8
    ret
    .cfi_endproc

在铛为86,这是更简单:

In clang for x86, it is much simpler:

test:                                   # @test
    subl    $28, %esp
    leal    36(%esp), %eax
    movl    %eax, 24(%esp)
    movl    stdout, %ecx
    movl    %eax, 8(%esp)
    movl    %ecx, (%esp)
    movl    $.L.str, 4(%esp)
    calll   vfprintf
    addl    $28, %esp
    retl

有没有真正停止上述任何code从内联这样的,所以这样看来,这简直就是对编译器编写者的政策决定。当然,对于像的printf 来的东西一个电话,这是pretty无意义优化掉了code扩张成本的调用/返回对 - 后总之,printf的是一个不小的短期作用。

There's nothing really stopping any of the above code from being inlined as such, so it would appear that it is simply a policy decision on the compiler writer. Of course, for a call to something like printf, it's pretty meaningless to optimise away a call/return pair for the cost of the code expansion - after all, printf is NOT a small short function.

(我的大多数在过去一年的工作,体面的部分已经落实在OpenCL的环境printf的,所以我知道远远超过了大多数人将永远甚至查找有关格式说明和printf的其他各种棘手的部分)

(A decent part of my work for most of the past year has been to implement printf in an OpenCL environment, so I know far more than most people will ever even look up about format specifiers and various other tricky parts of printf)

编辑:我们的使用将内嵌调用var_args功能,因此可以实现这样的事情OpenCL编译器。它不会将呼叫的printf做,因为它涨大了code非常多,但默认情况下,我们的编译器内联的一切,所有的时间,不管它是什么...它的工作,但我们发现有在code的printf 2-3份使得它很庞大(与编译器后端采取了很多的算法,一些错误的选择不再因其他各种缺点,包括最终code代) ,所以我们不得不增加code停止编译器这样做...

The OpenCL compiler we use WILL inline calls to var_args functions, so it is possible to implement such a thing. It won't do it for calls to printf, because it bloats the code very much, but by default, our compiler inlines EVERYTHING, all the time, no matter what it is... And it does work, but we found that having 2-3 copies of printf in the code makes it REALLY huge (with all sorts of other drawbacks, including final code generation taking a lot longer due to some bad choices of algorithms in the compiler backend), so we had to add code to STOP the compiler doing that...

这篇关于的可变参数函数的内联的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆