返回一个2元组的效率比std :: pair低吗？ [英] Is returning a 2-tuple less efficient than std::pair?

查看：231 发布时间：2018/4/20 16:37:35 c++ gcc clang calling-convention stdtuple

本文介绍了返回一个2元组的效率比std :: pair低吗？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

请考虑以下代码：

  #include< utility> 
 #include< tuple> 
 
 std :: pair< int，int> f1（）
 {
 return std :: make_pair（0x111，0x222）; 
} 
 
 std :: tuple< int，int> f2（）
 {
 return std :: make_tuple（0x111，0x222）; 
}

Clang 3和4在x86-64上生成类似的代码： p>

  f1（）：
 movabs rax，0x22200000111 
 ret 
 f2（）：
 movabs rax，0x11100000222;相反的包装顺序，不重要
 ret

但是Clang 5为<$ c生成不同的代码
$ b

  f2（）：
 movabs rax，0x11100000222 
 mov QWORD PTR [rdi]，rax 
 mov rax，rdi 
 ret

GCC 4至GCC 7也一样：

  f2（）：
 movabs rdx，0x11100000222 
 mov rax，rdi 
 mov QWORD PTR [rdi]，rdx; GCC 4-6使用2个DWORD存储
 ret

为什么生成的代码在返回时变得更糟一个 std :: tuple ，它适合单个寄存器，vs std :: pair ？它看起来特别奇怪，因为叮铛3和4看起来是最优的，但5不是。

在这里试试： https://godbolt.org/g/T2Yqrj
解决方案
简短的回答是因为 gcc 和 clang 使用的 libstc ++ code>在Linux上使用非平凡的移动构造函数实现 std :: tuple （特别是 _Tuple_impl 基类有一个非平凡的移动构造函数）。另一方面， std :: pair 的复制和移动构造函数都是默认的。
Gory细节您在Linux上运行测试，该测试遵守SysV x86-64 ABI。该ABI具有将函数传递或返回到类或结构的特定规则，您可以阅读有关此处的更多信息。我们感兴趣的具体情况是这些结构中的两个 int 字段是否会得到 INTEGER 类或者 MEMORY 类。 A 最近的版本的ABI规范有这样的说法： $ b
聚合的分类（结构和数组）和联合类型的工作方式如下：
如果一个对象的大小大于八个八字节，或者它包含如果一个C ++对象有一个非平凡的拷贝构造函数或者一个非平凡的析构函数13，它就会被不可见的引用（对象在参数列表中由具有类 INTEGER）14的指针替换。如果聚集的大小超过单个八字节，则每个对象都被分类分别。每个八字节被初始化为class NO_CLASS。
递归地分类对象的每个字段，以便始终考虑两个字段。 $ b 这里适用的条件（2）。请注意，它只提到拷贝构造函数，而不是移动构造函数 - 但显然它只是可能只是规范中的一个缺陷，因为引入了一般需要包含在任何分类中的移动构造函数算法，其中包含复制构造函数。特别是， gcc 的IA-64 cxx-abi记录在如果参数类型对于调用而言并非微不足道，调用者必须为临时分配空间并通过引用传递临时值。特别是：空间由调用者以通常的方式分配给临时的，通常在堆栈上。，然后定义非平凡：如果出现以下情况，则为调用而考虑非平凡：
它具有非平凡的拷贝构造函数，移动构造函数或析构函数， / li> 因此，从ABI角度来看，元组不被认为是可以复制的，它会得到 MEMORY 处理，这意味着你的函数必须填充 rdi 中被调用传入的堆栈分配对象。 std :: par 函数可以返回 rax 中的整个结构，因为它适合一个 EIGHTBYTE 并且拥有类 INTEGER 。这有什么关系吗？是的，严格来说，像你编译的那样的独立函数对于元组来说效率会更低，因为这个ABI不同点是烘焙进去的。通常情况下，编译器将能够看到函数的主体并将其内联或执行跨程序分析，即使没有内联。在这两种情况下，ABI不再重要，至少在体面优化器的情况下，两种方法都可能同样有效。例如让我们调用您的 f1（）和<$ c $ int add_pair（）{ auto p = f1（）; return p.first + p.second; } int add_tuple（）{ auto t = f2（）; return std :: get< 0>（t）+ std :: get< 1>（t）;原则上， add_tuple 方法从一个缺点开始，因为它必须调用效率较低的 f2（），它还必须在栈上创建一个临时元组对象，以便它可以传递它至 f2 作为隐藏参数。无论如何，这两个函数都被完全优化，直接返回正确的值： pre $ add_pair（）： mov eax，819 ret add_tuple（）： mov eax，819 ret
总的来说，你可以说这个ABI问题对元组的影响会相对较弱：它为函数增加了一个小的固定开销，遵守ABI，但这对于非常小的函数来说只会在相对意义上很重要 - 但是这些函数可能会在可以内联的地方声明（或者如果不是，您将离开性能表）。
libcstc ++ vs libc +++
如上所述，这是一个ABI问题，而不是优化问题，本身。 clang和gcc都已经在ABI的约束下尽可能地优化库代码 - 如果它们为生成代码如 f1（） std :: tuple 它们会破坏ABI兼容的调用者。如果切换到使用 libc ++ 而不是Linux默认的 libstdc ++ - 这个实现没有显式的移动构造函数（正如Marc Glisse在注释中提到的，它们是为了向后兼容而坚持使用该实现）。现在 clang （并且可能gcc，尽管我没有尝试过），生成相同的最优代码：f1（）：＃@ f1（） movabs rax，2345052143889 ret f2（）：＃@ f2（） movabs rax，2345052143889 ret pre> 早期版本的Clang 为什么版本 clang 以不同的方式编译它？这只是 clang中的错误或规范中的错误，具体取决于如何你看看它。该规范没有明确包含在需要传递临时指针的情况下移动构建。不符合IA-64 C ++ ABI。例如，编译clang用于执行的方式与 gcc 或较新版本的 clang 不兼容。规范是最终更新和clang 在版本5.0中改变了行为。更新： Marc Glisse ，最初有关非重要移动构造函数与C ++ ABI的交互以及 clang 中的混淆 / code>在某些时候改变了他们的行为，这可能解释了开关：移动的构造函数不清楚，并且在澄清时，clang将更改为遵循ABI。这可能是其中的一种情况。 Consider this code: #include <utility> #include <tuple> std::pair<int, int> f1() { return std::make_pair(0x111, 0x222); } std::tuple<int, int> f2() { return std::make_tuple(0x111, 0x222); } Clang 3 and 4 generate similar code for both on x86-64: f1(): movabs rax,0x22200000111 ret f2(): movabs rax,0x11100000222 ; opposite packing order, not important ret But Clang 5 generates different code for f2(): f2(): movabs rax,0x11100000222 mov QWORD PTR [rdi],rax mov rax,rdi ret As do GCC 4 through GCC 7: f2(): movabs rdx,0x11100000222 mov rax,rdi mov QWORD PTR [rdi],rdx ; GCC 4-6 use 2 DWORD stores ret Why is the generated code worse when returning a std::tuple that fits in a single register, vs std::pair? It seems especially strange since Clang 3 and 4 seemed to be optimal, yet 5 is not. Try it here: https://godbolt.org/g/T2Yqrj 解决方案 The short answer is because the libstc++ standard library implementation used by gcc and clang on Linux implements std::tuple with a non-trivial move constructor (in particular, the _Tuple_impl base class has a non-trivial move constructor). On the other hand, the copy and move constructors for std::pair are all defaulted. The Gory Details You ran your tests on Linux, which adheres to the SysV x86-64 ABI. This ABI has specific rules for passing or returning classes or structures to functions, which you can read more about here. The specific case we are interested in with whether the two int fields in these structures will get the INTEGER class or the MEMORY class. A recent version of the ABI specification has this to say: The classification of aggregate (structures and arrays) and union types works as follows: If the size of an object is larger than eight eightbytes, or it contains un- aligned fields, it has class MEMORY 12 . If a C++ object has either a non-trivial copy constructor or a non-trivial destructor 13 , it is passed by invisible reference (the object is replaced in the parameter list by a pointer that has class INTEGER) 14 . If the size of the aggregate exceeds a single eightbyte, each is classified separately. Each eightbyte gets initialized to class NO_CLASS. Each field of an object is classified recursively so that always two fields are considered. The resulting class is calculated according to the classes of the fields in the eightbyte It is condition (2) that applies here. Note that it mentions only copy constructors, and not move constructors - but it is fairly apparently that just is probably just a defect in the specification given the introduction of move constructors which generally need to be included in any classification algorithm where copy constructors were included before. In particular, IA-64 cxx-abi, which gcc is documented to follow does include move constructors: If the parameter type is non-trivial for the purposes of calls, the caller must allocate space for a temporary and pass that temporary by reference. Specifically: Space is allocated by the caller in the usual manner for a temporary, typically on the stack. and then the definition of non-trivial: A type is considered non-trivial for the purposes of calls if: it has a non-trivial copy constructor, move constructor, or destructor, or all of its copy and move constructors are deleted. So because tuple is not considered to be trivially copyable from an ABI perspective, it gets MEMORY treatment, which means that your function must populate the stack allocated object passed in by the called in rdi. The std::par function can just pass back the entire structure in rax since it fits in one EIGHTBYTE and has class INTEGER. Does it matter? Yeah, strictly speaking, a standalone function like the one you have compiled will be less efficient for tuple since this ABI different is "baked in". Often however, the compiler will be able to see the body of the function and inline it or perform inter-procedural analysis even if not inlined. In both cases, the ABI is no longer important and it is likely both approaches would be equally efficient, at least with a decent optimizer. For example let's call your f1() and f2() functions and do some math on the result: int add_pair() { auto p = f1(); return p.first + p.second; } int add_tuple() { auto t = f2(); return std::get<0>(t) + std::get<1>(t); } In principle the add_tuple method starts from a disadvantage, since it has to call f2() which is less efficient and it also has to create a temporary tuple object on the stack so it can pass it to f2 as the hidden parameter. Well no matter, both functions are fully optimized to just return the right value directly: add_pair(): mov eax, 819 ret add_tuple(): mov eax, 819 ret So overall you can say that the effect of this ABI issue with tuple will be relatively muted: it adds a small fixed overhead to functions that must comply with the ABI, but this will only really matter in a relative sense for very small functions - but such functions are likely to be declared in a place where they can be inlined (or if not, you are leaving performance on the table). libcstc++ vs libc+++ As explained above, this is an ABI issue, not an optimization issue, per se. Both clang and gcc are already optimizing the library code to maximum extent possible under the constraints of the ABI - if they generated code like f1() for the std::tuple case they would break ABI compliant callers. You can see this clearly if you switch to using libc++ rather than the Linux default of libstdc++ - this implementation doesn't have the explicit move constructor (as Marc Glisse mentions in the comments, they are stuck with this implementation for backwards compatibility). Now clang (and presumably gcc although I didn't try it), generates the same optimal code in both cases: f1(): # @f1() movabs rax, 2345052143889 ret f2(): # @f2() movabs rax, 2345052143889 ret Earlier Versions of Clang Why do versions of clang compile it differently? It was simply a bug in clang or a bug in the spec depending on how you look at it. The spec didn't explicitly include move construction in the cases where a hidden pointer to a temporary needed to be passed. wasn't conforming to the IA-64 C++ ABI. For example compiled the way clang used to do it was not compatible with gcc or newer versions of clang. The spec was eventually updated and the clang behavior changed in version 5.0. Update: Marc Glisse mentions in the comments that there was initially confusion about the interaction of non-trivial move constructors and the C++ ABI, and clang changed their behavior at some point, which probably explains the switch: The ABI specification for some argument passing cases involving move constructors were unclear, and when they were clarified, clang changed to follow the ABI. This is probably one of those cases. 这篇关于返回一个2元组的效率比std :: pair低吗？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

`libcstc ++ vs libc +++`

早期版本的Clang

The Gory Details

libcstc++ vs libc+++

Earlier Versions of Clang

返回一个2元组的效率比std :: pair低吗？ [英] Is returning a 2-tuple less efficient than std::pair?

问题描述

Gory细节

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

返回一个2元组的效率比std :: pair低吗？ [英] Is returning a 2-tuple less efficient than std::pair?

问题描述

Gory细节

libcs​​tc ++ vs libc +++

早期版本的Clang

The Gory Details

libcstc++ vs libc+++

Earlier Versions of Clang

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

`libcstc ++ vs libc +++`

登录关闭