C ++ 20协程的Lambda生命周期说明 [英] Lambda lifetime explanation for C++20 coroutines

查看:202
本文介绍了C ++ 20协程的Lambda生命周期说明的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

愚蠢具有适用于C ++ 20的库



在自述文件中声称:


重要提示:您需要对于临时Lambda对象的生命周期要非常小心。调用lambda协程会返回folly :: coro :: Task,它捕获对lambda的引用,因此,如果未立即共等待返回的Task,则当临时lambda超出范围时,该任务将带有悬空引用。 / p>

我尝试为他们提供的示例制作MCVE,并对结果感到困惑。
假设以下所有示例均使用以下样板:

  #include< folly /experimental/coro/Task.h> 
#include< folly / experimental / coro / BlockingWait.h>
#include< folly / futures / Future.h>
使用命名空间愚蠢;
使用命名空间folly :: coro;

int main(){
fmt :: print( Result:{} \n,blockingWait(foo()));
}

我用地址消毒器编译了以下内容,以查看是否有悬挂的引用。



编辑:已澄清的问题



问题 :为什么第二个示例不触发ASAN警告?



根据 cppreference


当协程到达co_return语句时,它将执行以下操作: / p>

...




  • 或调用promise.return_value(expr)作为co_return expr其中expr为非无效类型

  • 以创建它们的相反顺序销毁所有具有自动存储期限的变量。

  • 调用promise.final_suspend()并co_await是结果。


因此,也许临时lambda的状态实际上直到被破坏为止。返回结果,因为 foo 本身就是协程吗?






ASAN错误:我假设在等待协程时不存在'i'

  auto foo()->任务< int> {
自动任务= [i = 1]()-> folly :: coro :: Task< int> {
co_return i;
}(); //在此分号
返回任务之后,lambda被销毁;
}






没有错误-为什么?

  auto foo()->任务< int> {
自动任务= [i = 1]()-> folly :: coro :: Task< int> {
co_return i;
}();
co_return co_await std :: move(task);
}






ASAN错误:与第一个示例相同吗?

  auto foo()-> folly :: SemiFuture< int> {
自动任务= [i = 1]()-> folly :: coro :: Task< int> {
co_return i;
}();
return std :: move(task).semi();
}






没有错误 ...并且在很好的情况下,只需返回一个常数(未捕获任何lambda状态)即可。比较第一个示例:

  auto foo()->任务< int> {
自动任务= []()-> folly :: coro :: Task< int> {
co_return 1;
}();
返回任务;
}


解决方案

此问题并非唯一或特定于lambda;它可能会影响同时存储内部状态并且恰好是协程的任何可调用对象。但是制作lambda时最容易遇到这个问题,因此我们将以这种角度来看待它。



首先,一些术语。



在C ++中, lambda是对象,而不是函数。 lambda对象的函数调用操作符 operator()具有重载,该调用调用写入lambda主体的代码。那就是lambda的全部内容,因此当我随后提到 lambda时,我在谈论的是C ++对象而不是不是函数



在C ++中,成为协程是 function 的属性,而不是对象。协程是从外部看似与正常功能相同的功能,但在内部以可以暂停其执行的方式实现。当协程被挂起时,执行返回到直接调用/恢复协程的函数。



协程的执行稍后可以恢复执行(执行此操作的机制是不是我在这里要讨论的东西)。当协程暂停时,该协程中的所有堆栈变量都将保留,直到协程暂停为止。这个事实使协程恢复正常工作。这就是使协程代码看起来像普通C ++的原因,即使执行可以以非常不相交的方式发生。



协程不是对象,而lambda则不是函数。因此,当我使用看似矛盾的术语协程lambda时,我真正的意思是一个对象,该对象的 operator()重载恰好是一个协程。



我们清楚吗?



重要事实1:



当编译器评估lambda表达式时,它会创建prvalue lambda类型。此prvalue将(最终)初始化一个对象,通常将其初始化为评估所讨论的lambda表达式的函数范围内的临时对象。但这可能是一个堆栈变量。到底什么都不重要;重要的是,当您评估lambda表达式时,有一个对象在任何方面都类似于任何用户定义类型的常规C ++对象。



由lambda表达式捕获的值本质上是lambda对象的成员变量。它们可以是参考或值;没关系。当您在lambda主体中使用捕获名称时,您实际上是在访问lambda对象的命名成员变量。而且,关于lambda对象中的成员变量的规则与任何用户定义的对象中的成员变量的规则没有什么不同。



重要事实#2:



协程是一种可以暂停的函数,可以保留其堆栈值,以便稍后可以恢复执行。就我们的目的而言,堆栈值包括所有函数参数,直到暂停点为止生成的任何临时对象,以及到该点为止在函数中声明的任何函数局部变量。



然后保留的就是所有



成员函数可以是协程,但是协程悬挂机制并不关心成员变量。暂停仅适用于该功能的执行,不适用于该功能的对象



重要事实#3:



完全拥有协程的要点是能够暂停函数的执行,并通过其他一些代码恢复该函数的执行。这可能会出现在程序的某些不同部分中,通常是在与<协程>最初被调用的地方不同的线程中。也就是说,如果您创建一个协程,则希望该协程的调用方将在与您的协程函数一起执行时 parallel 继续执行。如果调用方确实等待执行完成,那么调用方会根据自己的选择执行 ,而不是您自己选择。



那是为什么首先将其作为协程。



folly :: coro :: Task 对象本质上是跟踪协程的暂停后执行,并封送协程产生的任何返回值。它还可能允许人们在执行其表示的协程之后安排恢复某些其他代码。因此,任务可能代表了一系列协程执行,每个执行都将数据馈送到下一个。



重要事实上,协程就像普通函数一样在一个地方开始,但是可以在最初调用它的调用栈的 outside 的其他时间点结束。



因此,我们将这些事实放在一起。



如果您是一个创建lambda的函数,那么您(至少在一段时间内时间)具有该lambda的prvalue,对吗?您可以自己存储(作为临时变量或堆栈变量),也可以将其传递给其他人。您自己或其他人有时会调用该lambda的 operator()。那时,lambda对象必须是活动的功能对象,否则您手上的问题就大得多了。



因此,lambda的直接调用者有一个lambda对象,并且lambda的函数开始执行。如果它是协程lambda,则此协程可能会在某个时候暂停执行。这样会将程序控制权转移回直接调用者,即保存lambda对象的代码。



这就是我们遇到IF#3的后果。请参阅,lambda对象的生存期由最初调用lambda的代码控制。但是lambda内协程 的执行是由一些任意的外部代码控制的。控制该执行的系统是 Task 对象,该对象通过初始执行协程lambda返回给直接调用者。



所以有 Task 代表了协程函数的执行。但是还有lambda对象。这些都是对象,但是它们是单独的对象,具有不同的生存期。



IF#1告诉我们,lambda捕获是成员变量,并且C ++的规则告诉我们,成员的生存期由它所属的对象的生存期决定。 IF#2告诉我们,协程悬架机制并未保留这些成员变量。 IF#3告诉我们,协程的执行受 Task 支配,该任务的执行可能(非常)与初始代码无关。



如果将所有内容放在一起,我们会发现,如果您有一个捕获变量的协程lambda,则被调用必须的lambda对象将继续存在,直到任务(或用于控制持续执行协程的任何方法)已完成协程lambda的执行。如果不是,那么协程lambda的执行可能会尝试访问生存期已结束的对象的成员变量。



您如何精确地执行此操作取决于您。






现在,让我们看一下您的示例。



示例1出于明显原因而失败。调用协程的代码创建一个代表lambda的临时对象。但是这种暂时性超出了范围。在执行 Task 时,不会做出任何努力来确保lambda仍然存在。这意味着协程程序有可能在其所驻留的lambda对象被销毁后恢复。



这很糟糕。



示例2实际上同样糟糕。创建任务后,lambda临时对象立即被销毁,因此只需 co_await 就没关系了。但是,ASAN可能根本没有抓住它,因为它现在发生在协程内部。如果您的代码改为:

  Task< int> foo(){
auto func = [i = 1]()-> folly :: coro :: Task< int> {
co_return i;
};

自动任务= func();

co_return co_await std :: move(task);
}

然后该代码就可以了。原因是在 Task 中执行 co_await 会导致当前协程暂停执行,直到任务完成,最后一件事是 func 。而且由于堆栈对象是通过协程暂停来保存的,因此只要协程存在, func 就将继续存在。



出于与示例1相同的原因,示例3是糟糕的。无论如何使用协程函数的返回值都无关紧要。如果在协程完成执行之前销毁了lambda,则代码将被破坏。



示例4在技术上与所有其余部分一样糟糕。但是,由于lambda是无法捕获的,因此它不需要访问lambda对象的任何成员。它实际上从未访问任何寿命已结束的对象,因此ASAN从未注意到协程周围的对象已死。是UB,但不太可能伤害您的UB。如果您已从lambda中明确提取了函数指针,那么即使UB也不会发生:

  Task< int> foo(){
auto func = + []()-> folly :: coro :: Task< int> {//由于复杂,复杂的原因,+从无捕获的lambda中提取函数指针。
co_return 1;
};
自动任务= func();
返回任务;
}


Folly has a useable library for C++20 style coroutines.

In the Readme it claims:

IMPORTANT: You need to be very careful about the lifetimes of temporary lambda objects. Invoking a lambda coroutine returns a folly::coro::Task that captures a reference to the lambda and so if the returned Task is not immediately co_awaited then the task will be left with a dangling reference when the temporary lambda goes out of scope.

I tried to make a MCVE for the example they provided, and was confused about the results. Assume the following boilerplate for all the following examples:

#include <folly/experimental/coro/Task.h>
#include <folly/experimental/coro/BlockingWait.h>
#include <folly/futures/Future.h>
using namespace folly;
using namespace folly::coro;

int main() {
    fmt::print("Result: {}\n", blockingWait(foo()));
}

I compiled the following with address sanitizer to see if there would be any dangling references.

EDIT: clarified question

Question: Why does the second example not trigger an ASAN warning?

According to cppreference:

When a coroutine reaches the co_return statement, it performs the following:

...

  • or calls promise.return_value(expr) for co_return expr where expr has non-void type
  • destroys all variables with automatic storage duration in reverse order they were created.
  • calls promise.final_suspend() and co_await's the result.

Thus perhaps the temporary lambda's state is not actually destroyed until the result is returned, because foo itself is a coroutine?


ASAN ERROR: I assume 'i' doesn't exist when the coroutine is waited on

auto foo() -> Task<int> {
    auto task = [i=1]() -> folly::coro::Task<int> {
        co_return i;
    }(); // lambda is destroyed after this semicolon
    return task;
}


NO ERROR -- why?

auto foo() -> Task<int> {
  auto task = [i=1]() -> folly::coro::Task<int> {
      co_return i;
  }();
  co_return co_await std::move(task);
}


ASAN ERROR: Same problem as first example?

auto foo() -> folly::SemiFuture<int> {
    auto task = [i=1]() -> folly::coro::Task<int> {
        co_return i;
    }();
    return std::move(task).semi();
}


NO ERROR ...and for good measure, just returning a constant (no lambda state captured) works fine. Compare to first example:

auto foo() -> Task<int> {
    auto task = []() -> folly::coro::Task<int> {
        co_return 1;
    }();
    return task;
}

解决方案

This problem is not unique or specific to lambdas; it could affect any callable object that simultaneously stores internal state and happens to be a coroutine. But this problem is easiest to encounter when making a lambda, so we'll look at it from that perspective.

First, some terminology.

In C++, a "lambda" is an object, not a function. A lambda object has an overload for the function call operator operator(), which invokes the code written into the lambda body. That is all a lambda is, so when I subsequently refer to "lambda", I am talking about a C++ object and not a function.

In C++, being a "coroutine" is a property of a function, not an object. A coroutine is a function that appears identical to a normal function from the outside, but which is implemented internally in such a way that its execution can be suspended. When a coroutine is suspended, execution returns to the function that directly invoked/resumed the coroutine.

The execution of the coroutine can later be resumed (the mechanism for doing so is not something I'm going to discuss much here). When a coroutine is suspended, all of the stack variables within that coroutine function up to the point of the coroutine's suspension are preserved. This fact is what allows resumption of the coroutine to work; it's what makes coroutine code seem like normal C++ even though execution can happen in a very disjoint fashion.

A coroutine is not an object, and a lambda is not a function. So, when I use the seemingly contradictory term "coroutine lambda", what I really mean is an object whose operator() overload happens to be a coroutine.

Are we clear? OK.

Important Fact #1:

When the compiler evaluates a lambda expression, it creates a prvalue of the lambda type. This prvalue will (eventually) initialize an object, usually as a temporary within the scope of the function that evaluated the lambda expression in question. But it could be a stack variable. Which it is doesn't really matter; what matters is that, when you evaluate a lambda expression, there is an object which is in every way like a regular C++ object of any user-defined type. That means it has a lifetime.

Values "captured" by the lambda expression are essentially member variables of the lambda object. They could be references or values; it doesn't really matter. When you use a capture name in the lambda body, you are really accessing the named member variable of the lambda object. And the rules about member variables in a lambda object are no different from the rules about member variables in any user-defined object.

Important Fact #2:

A coroutine is a function which can be suspended in such a way that its "stack values" can be preserved, so that it can resume its execution later. For our purposes, "stack values" include all function parameters, any temporary objects generated up to the point of suspension, and any function local variables declared in the function up to that point.

And that is all that gets preserved.

A member function can be a coroutine, but the coroutine suspension mechanism does not care about member variables. Suspension only applies to the execution of that function, not to the object around that function.

Important Fact #3:

The main point of having coroutines at all is to be able to suspend a function's execution and have that function's execution resumed by some other code. This likely will be in some disparate part of the program, and usually in a thread distinct from the place where the coroutine was initially invoked. That is, if you create a coroutine, you expect that the caller of that coroutine will continue its execution in parallel with your coroutine function's execution. If the caller does wait for your execution to complete, the caller does so at its choosing, not yours.

That's why you made it a coroutine to begin with.

The point of the folly::coro::Task object is to essentially keep track of the coroutine's post-suspension execution, as well as marshall any return value(s) generated by it. It also may permit one to schedule the resumption of some other code after the execution of the coroutine it represents. So a Task could represent a long series of coroutine executions, with each feeding data to the next.

The important fact here is that the coroutine starts in one place like a normal function, but it can end at some other point in time outside of the callstack that invoked it initially.

So, let's put those these facts together.

If you're a function that creates a lambda, then you (at least for some period of time) have a prvalue of that lambda, right? You will either store it yourself (as a temporary or stack variable) or you will pass it to someone else. Either yourself or that someone else will at some point invoke the operator() of that lambda. At that point, the lambda object must be a live, functional object, or you've got a much bigger problem on your hands.

So the immediate caller of a lambda has an lambda object, and the lambda's function starts executing. If it is a coroutine lambda, then this coroutine will likely at some point suspend its execution. This transfers program control back to the immediate caller, the code which holds the lambda object.

And that's where we encounter the consequences of IF#3. See, the lambda object's lifetime is controlled by the code which initially invoked the lambda. But the execution of the coroutine within that lambda is controlled by some arbitrary, external code. The system which governs this execution is the Task object returned to the immediate caller by the initial execution of the coroutine lambda.

So there's the Task which represents the coroutine function's execution. But there's also the lambda object. These are both objects, but they are separate objects, with distinct lifetimes.

IF#1 tells us that lambda captures are member variables, and the rules of C++ tell us that the lifetime of a member is governed by the lifetime of the object it is a member of. IF#2 tells us that these member variables are not preserved by the coroutine suspension mechanism. And IF#3 tells us that the coroutine execution is governed by the Task, whose execution can be (very) unrelated to the initial code.

If you put this all together, what we find is that, if you have a coroutine lambda which captures variables, then the lambda object which was invoked must continue to exist until the Task (or whatever governs continued coroutine execution) has completed the coroutine lambda's execution. If it doesn't, then the coroutine lambda's execution may attempt to access member variables of an object whose lifetime has ended.

How exactly you do that is up to you.


Now, let's look at your examples.

Example 1 fails for obvious reasons. The code invoking the coroutine creates a temporary object representing the lambda. But that temporary goes out of scoped immediately. No effort is made to ensure that the lambda remains in existence while the Task is executing. This means that it is possible for the coroutine to be resumed after the lambda object it lives within has been destroyed.

That's bad.

Example 2 is actually just as bad. The lambda temporary is destroyed immediately after the creation of tasks, so merely co_awaiting on it shouldn't matter. However, ASAN may simply not have caught it because it now happens inside of a coroutine. If your code had instead been:

Task<int> foo() {
  auto func = [i=1]() -> folly::coro::Task<int> {
      co_return i;
  };

  auto task = func();

  co_return co_await std::move(task);
}

Then the code would be fine. The reason being that co_awaiting on a Task causes the current coroutine to suspend its execution until the last thing in the Task is done, and that "last thing" is func. And since stack objects are preserved by coroutine suspension, func will continue to exist so long as this coroutine does.

Example 3 is bad for the same reasons as Example 1. It doesn't matter how you use the return value of the coroutine function; if you destroy the lambda before the coroutine finishes execution, your code is broken.

Example 4 is technically just as bad as all the rest. However, because the lambda is captureless, it never needs to access any members of the lambda object. It never actually accesses any object whose lifetime has ended, so ASAN never notices that the object around the coroutine is dead. It's UB, but it's UB that's unlikely to hurt you. If you had explicitly extracted a function pointer from the lambda, even that UB wouldn't happen:

Task<int> foo() {
    auto func = +[]() -> folly::coro::Task<int> { //The + extracts a function pointer from a captureless lambda for complex, convoluted reasons.
        co_return 1;
    };
    auto task = func();
    return task;
}

这篇关于C ++ 20协程的Lambda生命周期说明的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆