TPL数据流块消耗了所有可用的内存 [英] TPL Dataflow block consumes all available memory

查看:66
本文介绍了TPL数据流块消耗了所有可用的内存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个采用以下设计的 TransformManyBlock :

I have a TransformManyBlock with the following design:

  • 输入:文件路径
  • 输出:IEnumerable文件内容,一次仅一行

我正在一个巨大的文件(61GB)上运行此块,该文件太大而无法放入RAM.为了避免无限增长的内存,我为此块和所有下游块将 BoundedCapacity 设置为一个非常低的值(例如1).尽管如此,该块显然还是贪婪地迭代IEnumerable,它消耗了计算机上所有可用的内存,使每个进程都停顿了下来.该块的OutputCount继续无限制地增加,直到我终止进程为止.

I am running this block on a huge file (61GB), which is too large to fit into RAM. In order to avoid unbounded memory growth, I have set BoundedCapacity to a very low value (e.g. 1) for this block, and all downstream blocks. Nonetheless, the block apparently iterates the IEnumerable greedily, which consumes all available memory on the computer, grinding every process to a halt. The OutputCount of the block continues to rise without bound until I kill the process.

如何防止该块以这种方式消耗 IEnumerable ?

What can I do to prevent the block from consuming the IEnumerable in this way?

这是一个说明问题的示例程序:

Here's an example program that illustrates the problem:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
using System.Threading.Tasks.Dataflow;

class Program
{
    static IEnumerable<string> GetSequence(char c)
    {
        for (var i = 0; i < 1024 * 1024; ++i)
            yield return new string(c, 1024 * 1024);
    }

    static void Main(string[] args)
    {
        var options = new ExecutionDataflowBlockOptions() { BoundedCapacity = 1 };
        var firstBlock = new TransformManyBlock<char, string>(c => GetSequence(c), options);
        var secondBlock = new ActionBlock<string>(str =>
            {
                Console.WriteLine(str.Substring(0, 10));
                Thread.Sleep(1000);
            }, options);

        firstBlock.LinkTo(secondBlock);
        firstBlock.Completion.ContinueWith(task =>
            {
                if (task.IsFaulted) ((IDataflowBlock) secondBlock).Fault(task.Exception);
                else secondBlock.Complete();
            });

        firstBlock.Post('A');
        firstBlock.Complete();
        for (; ; )
        {
            Console.WriteLine("OutputCount: {0}", firstBlock.OutputCount);
            Thread.Sleep(3000);
        }
    }
}

如果您使用的是64位框,请确保清除Visual Studio中的首选32位"选项.我的计算机上有16GB的RAM,该程序立即消耗了每个可用字节.

If you're on a 64-bit box, make sure to clear the "Prefer 32-bit" option in Visual Studio. I have 16GB of RAM on my computer, and this program immediately consumes every available byte.

推荐答案

您似乎误解了TPL Dataflow的工作原理.

You seem to misunderstand how TPL Dataflow works.

BoundedCapacity 限制了您可以发布到区块中的项目数量.在您的情况下,这意味着将 char 放入 TransformManyBlock 中,并将 string 放入 ActionBlock 中.

BoundedCapacity limits the amount of items you can post into a block. In your case that means a single char into the TransformManyBlock and single string into the ActionBlock.

因此,您将一个项目发布到 TransformManyBlock ,然后返回 1024 * 1024 字符串,并尝试将其传递给 ActionBlock 一次只能接受一个.其余的字符串仅位于 TransformManyBlock 的输出队列中.

So you post a single item to the TransformManyBlock which then returns 1024*1024 strings and tries to pass them on to the ActionBlock which will only accept a single one at a time. The rest of the strings will just sit there in the TransformManyBlock's output queue.

您可能想要做的是创建一个块,并在达到其容量时通过等待(同步或其他方式)以流方式将项目发布到其中:

What you probably want to do is create a single block and post items into it in a streaming fashion by waiting (synchronously or otherwise) when it's capacity is reached:

private static void Main()
{
    MainAsync().Wait();
}

private static async Task MainAsync()
{
    var block = new ActionBlock<string>(async item =>
    {
        Console.WriteLine(item.Substring(0, 10));
        await Task.Delay(1000);
    }, new ExecutionDataflowBlockOptions { BoundedCapacity = 1 });

    foreach (var item in GetSequence('A'))
    {
        await block.SendAsync(item);
    }

    block.Complete();
    await block.Completion;
}

这篇关于TPL数据流块消耗了所有可用的内存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆