如何优化C和C函数的返回值++上的x86-64? [英] How to optimize function return values in C and C++ on x86-64?

查看:172
本文介绍了如何优化C和C函数的返回值++上的x86-64?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

借助的x86-64 ABI 指定两个返回寄存器: RAX RDX ,在大小两个64位(8字节)。

The x86-64 ABI specifies two return registers: rax and rdx, both 64-bits (8 bytes) in size.

假设X86-64是唯一有针对性的平台,这两个功能:

Assuming that x86-64 is the only targeted platform, which of these two functions:

uint64_t f(uint64_t * const secondReturnValue) {
    /* Calculate a and b. */
    *secondReturnValue = b;
    return a;
}

std::pair<uint64_t, uint64_t> g() {
    /* Calculate a and b, same as in f() above. */
    return { a, b };
}

会产生更好的性能,因为C / C ++编译器针对x86-64的现状如何?是否有性能明智使用一种或另一种版本的任何陷阱?是编译器(GCC,锵)总是能够优化的std ::对 RAX 要返回和 RDX

更新:通常情况下,返回一对是更快,如果编译器优化了二进制输出的的std ::对的方法(例如与<一个href=\"https://gcc.godbolt.org/#compilers:!((compiler:g530,options:'-std%3Dc%2B%2B11+-O2',source:'%23include+%3Ccstdint%3E%0A%23include+%3Cutility%3E%0A%0Aconstexpr+uint64_t+a+%3D+1u%3B%0Aconstexpr+uint64_t+b+%3D+2u%3B%0A%0Auint64_t+f(uint64_t+*+const+secondReturnValue)+%7B%0A++++*secondReturnValue+%3D+b%3B%0A++++return+a%3B%0A%7D%0A%0Astd::pair%3Cuint64_t,+uint64_t%3E+g()+%7B+return+%7Ba,+b%7D%3B+%7D')),filterAsm:(commentOnly:!t,directives:!t,labels:!t),version:3\"相对=nofollow> GCC 5.3.0 和<一个href=\"https://gcc.godbolt.org/#compilers:!((compiler:clang380,options:'-std%3Dc%2B%2B11+-O2',source:'%23include+%3Ccstdint%3E%0A%23include+%3Cutility%3E%0A%0Aconstexpr+uint64_t+a+%3D+1u%3B%0Aconstexpr+uint64_t+b+%3D+2u%3B%0A%0Auint64_t+f(uint64_t+*+const+secondReturnValue)+%7B%0A++++*secondReturnValue+%3D+b%3B%0A++++return+a%3B%0A%7D%0A%0Astd::pair%3Cuint64_t,+uint64_t%3E+g()+%7B+return+%7Ba,+b%7D%3B+%7D')),filterAsm:(commentOnly:!t,directives:!t,labels:!t),version:3\"相对=nofollow>锵3.8.0 )。如果 F()不是内联,编译器必须生成code写一个值到内存中,例如:

UPDATE: Generally, returning a pair is faster if the compiler optimizes out the std::pair methods (examples of binary output with GCC 5.3.0 and Clang 3.8.0). If f() is not inlined, the compiler must generate code to write a value to memory, e.g:

movq b, (%rdi)
movq a, %rax
retq

但在情况下,克()就足够了编译器做的:

movq a, %rax
movq b, %rdx
retq

由于写入值指示内存通常比慢的指令写入值到寄存器,第二个版本应该会更快。

Because instructions for writing values to memory are generally slower than instructions for writing values to registers, the second version should be faster.

推荐答案

由于ABI规定,在某些特定的情况下,两个寄存器都被用于2字造成任何符合编译器必须遵守的规则。

Since the ABI specifies that in some particular cases two registers have to be used for the 2-word result any conforming compiler has to obey that rule.

不过,对于这样的小功能,我想,大部分的业绩将来自内联。

However, for such tiny functions I guess that most of the performance will come from inlining.

您可能要编译的和链接的有 G ++ -flto -O2 使用链接时优化。

You may want to compile and link with g++ -flto -O2 using link-time optimizations.

我想这第二个函数(返回一对直通2寄存器)可能会稍快,这也许在某些情况下,GCC编译器可以内联和优化第一到第二。

I guess that the second function (returning a pair thru 2 registers) might be slightly faster, and that perhaps in some situations the GCC compiler could inline and optimize the first into the second.

但你真的应该基准,如果你在乎这个。

But you really should benchmark if you care that much.

这篇关于如何优化C和C函数的返回值++上的x86-64?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆