为什么这个C ++包装器类没有被内联? [英] Why is this C++ wrapper class not being inlined away?

查看:51
本文介绍了为什么这个C ++包装器类没有被内联?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

编辑-我的构建系统有问题.我仍在弄清楚到底是什么,但是gcc产生了奇怪的结果(即使它是一个.cpp文件),但是一旦我使用了g++,它就会按预期工作.

EDIT - something's up with my build system. I'm still figuring out exactly what, but gcc was producing weird results (even though it's a .cpp file), but once I used g++ then it worked as expected.

对于我一直遇到的问题,这是一个非常简化的测试用例,其中使用数字包装器类(我认为应该内联)使我的程序慢10倍.

This is a very reduced test-case for something I've been having trouble with, where using a numerical wrapper class (which I thought would be inlined away) made my program 10x slower.

这与优化级别无关(已尝试使用-O0-O3).

This is independent of optimisation level (tried with -O0 and -O3).

我在包装程序类中缺少一些细节吗?

Am I missing some detail in my wrapper class?

我有以下程序,其中定义了一个包装double并提供+运算符的类:

I have the following program, in which I define a class which wraps a double and provides the + operator:

#include <cstdio>
#include <cstdlib>

#define INLINE __attribute__((always_inline)) inline

struct alignas(8) WrappedDouble {
    double value;

    INLINE friend const WrappedDouble operator+(const WrappedDouble& left, const WrappedDouble& right) {
        return {left.value + right.value};
    };
};

#define doubleType WrappedDouble // either "double" or "WrappedDouble"

int main() {
    int N = 100000000;
    doubleType* arr = (doubleType*)malloc(sizeof(doubleType)*N);
    for (int i = 1; i < N; i++) {
        arr[i] = arr[i - 1] + arr[i];
    }

    free(arr);
    printf("done\n");

    return 0;
}

我认为这可以编译为相同的东西-它进行​​相同的计算,并且所有内容都内联.

I thought that this would compile to the same thing - it's doing the same calculations, and everything is inlined.

但是,事实并非如此-无论优化级别如何,它都会产生更大,更慢的结果.

However, it's not - it produces a larger and slower result, regardless of optimisation level.

(此特定结果不会显着降低 的速度,但是我的实际用例包含了更多的算法.)

(This particular result is not significantly slower, but my actual use-case includes more arithmetic.)

编辑-我知道这不是在构造我的数组元素.我以为这样可以减少ASM的产生,因此我可以更好地理解它,但是如果有问题,我可以更改它.

EDIT - I am aware that this isn't constructing my array elements. I thought this might produce less ASM so I could understand it better, but I can change it if it's a problem.

编辑-我也知道我应该使用new[]/delete[].不幸的是,即使gcc.cpp文件中,也拒绝对其进行编译.这是我的构建系统被搞砸的症状,这可能是我的实际问题.

EDIT - I am also aware that I should be using new[]/delete[]. Unfortunately gcc refused to compile that, even though it was in a .cpp file. This was a symptom of my build system being screwed up, which is probably my actual problem.

编辑-如果我使用g++而不是gcc,它将产生相同的输出.

EDIT - If I use g++ instead of gcc, it produces identical output.

编辑-我发布了错误的ASM版本(-O0而不是-O3),因此本节无济于事.

EDIT - I posted the wrong version of the ASM (-O0 instead of -O3), so this section isn't helpful.

我在64位系统的Mac上使用XCode的gcc.除了for循环的主体之外,结果是相同的.

I'm using XCode's gcc on my Mac, on a 64-bit system. The result is the same, aside from the body of the for-loop.

如果doubleTypedouble,则这是循环体的内容:

Here's what it produces for the body of the loop if doubleType is double:

movq    -16(%rbp), %rax
movl    -20(%rbp), %ecx
subl    $1, %ecx
movslq  %ecx, %rdx
movsd   (%rax,%rdx,8), %xmm0    ## xmm0 = mem[0],zero
movq    -16(%rbp), %rax
movslq  -20(%rbp), %rdx
addsd   (%rax,%rdx,8), %xmm0
movq    -16(%rbp), %rax
movslq  -20(%rbp), %rdx
movsd   %xmm0, (%rax,%rdx,8)

WrappedDouble版本更长:

movq    -40(%rbp), %rax
movl    -44(%rbp), %ecx
subl    $1, %ecx
movslq  %ecx, %rdx
shlq    $3, %rdx
addq    %rdx, %rax
movq    -40(%rbp), %rdx
movslq  -44(%rbp), %rsi
shlq    $3, %rsi
addq    %rsi, %rdx
movq    %rax, -16(%rbp)
movq    %rdx, -24(%rbp)
movq    -16(%rbp), %rax
movsd   (%rax), %xmm0           ## xmm0 = mem[0],zero
movq    -24(%rbp), %rax
addsd   (%rax), %xmm0
movsd   %xmm0, -8(%rbp)
movsd   -8(%rbp), %xmm0         ## xmm0 = mem[0],zero
movsd   %xmm0, -56(%rbp)
movq    -40(%rbp), %rax
movslq  -44(%rbp), %rdx
movq    -56(%rbp), %rsi
movq    %rsi, (%rax,%rdx,8)

推荐答案

当您打开-O3的优化时,这两个版本的g++clang++导致相同的汇编代码.

Both versions result in identical assembly code with g++ and clang++ when you turn on optimizations with -O3.

这篇关于为什么这个C ++包装器类没有被内联?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆