TPL 数据流:为什么EnsureOrdered = false 会破坏此TransformManyBlock 的并行性? [英] TPL Dataflow: Why does EnsureOrdered = false destroy parallelism for this TransformManyBlock?

查看:35
本文介绍了TPL 数据流:为什么EnsureOrdered = false 会破坏此TransformManyBlock 的并行性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究 TPL 数据流管道,并注意到一些与 TransformManyBlock 中的排序/并行相关的奇怪行为(也可能适用于其他块).

I'm working on a TPL Dataflow pipeline and noticed some strange behaviour related to ordering/parallelism in TransformManyBlocks (might apply to other blocks as well).

这是我要重现的代码(.NET 4.7.2,TPL Dataflow 4.9.0):

Here is my code to reproduce (.NET 4.7.2, TPL Dataflow 4.9.0):

class Program
{
    static void Main(string[] args)
    {
        var sourceBlock = new TransformManyBlock<int, Tuple<int, int>>(i => Source(i),
            new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 4, EnsureOrdered = false });

        var targetBlock = new ActionBlock<Tuple<int, int>>(tpl =>
        {
            Console.WriteLine($"Received ({tpl.Item1}, {tpl.Item2})");
        },
        new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 4, EnsureOrdered = true });

        sourceBlock.LinkTo(targetBlock, new DataflowLinkOptions { PropagateCompletion = true });

        for (int i = 0; i < 10; i++)
        {
            sourceBlock.Post(i);
        }

        sourceBlock.Complete();
        targetBlock.Completion.Wait();
        Console.WriteLine("Finished");
        Console.Read();
    }

    static IEnumerable<Tuple<int, int>> Source(int i)
    {
        var rand = new Random(543543254);
        for (int j = 0; j < i; j++)
        {
            Thread.Sleep(rand.Next(100, 1500));
            Console.WriteLine($"Returning ({i}, {j})");
            yield return Tuple.Create(i, j);
        }
    }
}

我想要的行为如下:

  • 源块应该并行返回元组,唯一的要求是它们应该由辅助属性 j 排序.
  • 目标块应按照接收到的顺序处理消息.

据我了解,yield return的性质满足了二级排序条件,所以EnsureOrdered可以设置为false.如果将其设置为 true,则源块将在不可接受的时间内保留消息,因为它会等待所有 yield return 完成后再传递消息(在真正的应用程序处理了许多 GB 的数据,这意味着我们希望尽快通过管道传播数据,以便我们可以释放 RAM).这是当源块的 EnsureOrdered 设置为 true 时的示例输出:

From what I understand, the secondary ordering condition is satisfied by the nature of yield return, so EnsureOrdered can be set to false. If this is set to true, the source block will withhold messages for an unacceptable amount of time since it will wait for all yield return to complete before passing the message along (in the real app many GB of data is processed which means that we want to propagate data through the pipeline as quickly as possible so we can release RAM). This is a sample output when EnsureOrdered of the source block is set to true:

Returning (1, 0)
Returning (2, 0)
Returning (4, 0)
Returning (3, 0)
Returning (2, 1)
Returning (4, 1)
Returning (3, 1)
Received (1, 0)
Received (2, 0)
Received (2, 1)
Returning (4, 2)
Returning (3, 2)
Received (3, 0)
Received (3, 1)
Received (3, 2)
Returning (5, 0)
Returning (6, 0)

我们可以看到源块并行工作,但会等待传播消息,直到为行中的下一个 i 生成所有消息(如预期).

We can see that the source block works in parallel, but waits to propagate messages until all the messages for the next i in line has been generated (as expected).

但是,当源块的 EnsureOrderedfalse 时(如代码示例中所示),我得到以下输出:

However when EnsureOrdered for the source block is false (as in the code sample), I get the following output:

Returning (2, 0)
Received (2, 0)
Returning (2, 1)
Received (2, 1)
Returning (4, 0)
Received (4, 0)
Returning (4, 1)
Received (4, 1)
Returning (4, 2)
Received (4, 2)
Returning (4, 3)
Received (4, 3)
Returning (1, 0)
Received (1, 0)
Returning (3, 0)
Received (3, 0)
Returning (3, 1)
Received (3, 1)
Returning (3, 2)
Received (3, 2)

源块在可用时成功传播消息,但似乎失去了并行性,因为它一次只处理一个 i.

The source block successfully propagates messages when available, however it seems like parallelism is lost since it is just working with one i at a time.

这是为什么?如何强制它并行处理?

Why is this? How can I force it to process in parallel?

推荐答案

此处正在进行修复:https://github.com/dotnet/corefx/pull/31059

感谢您的报告!

这篇关于TPL 数据流:为什么EnsureOrdered = false 会破坏此TransformManyBlock 的并行性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆