C ++标准是否强制通过引用捕获局部变量效率低下? [英] Does the C++ standard force capture-by-reference of local variables to be inefficient?

查看:117
本文介绍了C ++标准是否强制通过引用捕获局部变量效率低下?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近需要一个lambda来通过引用捕获多个局部变量,因此我制作了一个测试代码段以研究其效率,并使用clang 3.6使用-O3对其进行了编译:

I recently needed a lambda that captured multiple local variables by reference, so I made a test snippet to investigate its efficiency, and compiled it with -O3 using clang 3.6:

void do_something_with(void*);

void test()
{
    int a = 0, b = 0, c = 0;

    auto func = [&] () {
        a++;
        b++;
        c++;
    };

    do_something_with((void*)&func);
}


movl   $0x0,0x24(%rsp)
movl   $0x0,0x20(%rsp)
movl   $0x0,0x1c(%rsp)

lea    0x24(%rsp),%rax
mov    %rax,(%rsp)
lea    0x20(%rsp),%rax
mov    %rax,0x8(%rsp)
lea    0x1c(%rsp),%rax
mov    %rax,0x10(%rsp)

lea    (%rsp),%rdi
callq  ...

很明显,lambda只需要一个变量的地址,就可以通过相对寻址从中获得所有其他变量.

Clearly the lambda only needs the address of one of the variables, from which all the others could be obtained by relative addressing.

相反,编译器在堆栈上创建了一个结构,该结构包含指向每个局部变量的指针,然后将该结构的地址传递给lambda.就像我写的一样:

Instead, the compiler created a struct on the stack containing pointers to each local variable, and then passed the address of the struct to the lambda. It's much in the same way as if I had written:

int a = 0, b = 0, c = 0;

struct X
{
    int *pa, *pb, *pc;
};

X x = {&a, &b, &c};

auto func = [p = &x] () {
    (*p->pa)++;
    (*p->pb)++;
    (*p->pc)++;
};

由于各种原因,这种方法效率低下,但是最令人担忧的是,如果捕获了太多的变量,它可能导致堆分配.

This is inefficient for various reasons, but most worryingly because it could lead to heap-allocation if too many variables are captured.

我的问题:

  1. clang和gcc均在-O3处执行此操作,这使我怀疑标准中的某些内容实际上迫使闭包的实现效率低下.是这样吗?

  1. The fact that both clang and gcc do this at -O3 makes me suspect that something in the standard actually forces closures to be implemented inefficiently. Is this the case?

如果是,那么原因何在?不能保证lambda的二进制兼容性,因为任何知道lambda类型的代码都可以保证位于同一翻译单元中.

If so, then for what reasoning? It cannot be for binary compatibility of lambdas between compilers, because any code that knows about the type of the lambda is guaranteed to lie in the same translation unit.

如果没有,那么为什么两个主要的编译器都缺少这种优化?

If not, then why is this optimisation missing from two major compilers?



这是我想从编译器中看到的更有效代码的示例.这段代码使用的堆栈空间更少,lambda现在仅执行一个指针间接操作,而不是两个指针,并且lambda的大小不会随着捕获变量的数量而增加:



Here is an example of the more efficient code that I would like to have seen from the compiler. This code uses less stack space, the lambda now only performs one pointer indirection instead of two, and the lambda's size does not grow in the number of captured variables:

struct X
{
    int a = 0, b = 0, c = 0;
} x;

auto func = [&x] () {
    x.a++;
    x.b++;
    x.c++;
};


movl   $0x0,0x8(%rsp)
movl   $0x0,0xc(%rsp)
movl   $0x0,0x10(%rsp)

lea    0x8(%rsp),%rax
mov    %rax,(%rsp)

lea    (%rsp),%rdi
callq  ...

推荐答案

它看起来像是未指定的行为. C ++ 14标准草案N3936 的以下段落5.1.2 Lambda表达式 [expr.prim.lambda] 部分让我觉得这样:

It looks like unspecified behavior. The following paragraph from the C++14 draft standard: N3936 section 5.1.2 Lambda Expressions [expr.prim.lambda] makes me think this:

如果实体是隐式或显式的,则通过引用捕获该实体 已捕获但未被复制捕获.不确定是否 在闭包中声明了其他未命名的非静态数据成员 通过引用捕获的实体的类型. [...]

An entity is captured by reference if it is implicitly or explicitly captured but not captured by copy. It is unspecified whether additional unnamed non-static data members are declared in the closure type for entities captured by reference. [...]

对于通过副本捕获的实体有何不同:

which different for entities captured by copy:

a的复合语句中的每个id表达式 lambda-expression,是由以下对象捕获的实体的odr-use(3.2) 副本转换为对相应未命名数据的访问 闭包类型的成员.

Every id-expression within the compound-statement of a lambda-expression that is an odr-use (3.2) of an entity captured by copy is transformed into an access to the corresponding unnamed data member of the closure type.

感谢dyp指出了一些我不知所措的相关文档.看起来像缺陷报告750:仅参考的实现约束闭包对象提供了当前措词的理由,并说:

Thanks to dyp for pointing out some relevant documents which I somehow missed. It looks like defect report 750: Implementation constraints on reference-only closure objects provides the rationale for the current wording, and it says:

考虑如下示例:

Consider an example like:

void f(vector<double> vec) {
  double x, y, z;
  fancy_algorithm(vec, [&]() { /* use x, y, and z in various ways */ });
}

5.1.2 [expr.prim.lambda]第8段要求此lambda的闭包类将具有三个引用成员,而第12段 要求它从std :: reference_closure派生,这意味着两个 其他指针成员.尽管8.3.2 [dcl.ref]第4段 允许在不分配存储空间的情况下实现引用, 当前的ABI要求将引用实现为指针.这 这些要求的实际效果是,对于 该lambda表达式将包含五个指针.如果不是这些 要求,但是,有可能实施关闭 对象作为指向堆栈框架的单个指针,生成数据 在函数调用运算符中以相对于的偏移量进行访问 框架指针.当前的规范过于严格.

5.1.2 [expr.prim.lambda] paragraph 8 requires that the closure class for this lambda will have three reference members, and paragraph 12 requires that it be derived from std::reference_closure, implying two additional pointer members. Although 8.3.2 [dcl.ref] paragraph 4 allows a reference to be implemented without allocation of storage, current ABIs require that references be implemented as pointers. The practical effect of these requirements is that the closure object for this lambda expression will contain five pointers. If not for these requirements, however, it would be possible to implement the closure object as a single pointer to the stack frame, generating data accesses in the function-call operator as offsets relative to the frame pointer. The current specification is too tightly constrained.

呼应您关于允许潜在优化的确切观点,并已作为

which echos your exact points about allowing potential optimizations and was implemented as part of N2927 which includes the following:

新的措词不再为按引用"捕获指定任何重写或闭包成员. 通过引用"捕获的实体的使用会影响原始实体,以及影响该实体的机制. 实现这一点完全留给实施.

The new wording no longer specifies any rewrite or closure members for "by reference" capture. Uses of entities captured "by reference" affect the original entities, and the mechanism to achieve this is left entirely to the implementation.

这篇关于C ++标准是否强制通过引用捕获局部变量效率低下?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆