TPL数据流块消耗了所有可用的内存 [英] TPL Dataflow block consumes all available memory
问题描述
我有一个采用以下设计的 TransformManyBlock
:
I have a TransformManyBlock
with the following design:
- 输入:文件路径
- 输出:IEnumerable文件内容,一次仅一行
我正在一个巨大的文件(61GB)上运行此块,该文件太大而无法放入RAM.为了避免无限增长的内存,我为此块和所有下游块将 BoundedCapacity
设置为一个非常低的值(例如1).尽管如此,该块显然还是贪婪地迭代IEnumerable,它消耗了计算机上所有可用的内存,使每个进程都停顿了下来.该块的OutputCount继续无限制地增加,直到我终止进程为止.
I am running this block on a huge file (61GB), which is too large to fit into RAM. In order to avoid unbounded memory growth, I have set BoundedCapacity
to a very low value (e.g. 1) for this block, and all downstream blocks. Nonetheless, the block apparently iterates the IEnumerable greedily, which consumes all available memory on the computer, grinding every process to a halt. The OutputCount of the block continues to rise without bound until I kill the process.
如何防止该块以这种方式消耗 IEnumerable
?
What can I do to prevent the block from consuming the IEnumerable
in this way?
这是一个说明问题的示例程序:
Here's an example program that illustrates the problem:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
using System.Threading.Tasks.Dataflow;
class Program
{
static IEnumerable<string> GetSequence(char c)
{
for (var i = 0; i < 1024 * 1024; ++i)
yield return new string(c, 1024 * 1024);
}
static void Main(string[] args)
{
var options = new ExecutionDataflowBlockOptions() { BoundedCapacity = 1 };
var firstBlock = new TransformManyBlock<char, string>(c => GetSequence(c), options);
var secondBlock = new ActionBlock<string>(str =>
{
Console.WriteLine(str.Substring(0, 10));
Thread.Sleep(1000);
}, options);
firstBlock.LinkTo(secondBlock);
firstBlock.Completion.ContinueWith(task =>
{
if (task.IsFaulted) ((IDataflowBlock) secondBlock).Fault(task.Exception);
else secondBlock.Complete();
});
firstBlock.Post('A');
firstBlock.Complete();
for (; ; )
{
Console.WriteLine("OutputCount: {0}", firstBlock.OutputCount);
Thread.Sleep(3000);
}
}
}
如果您使用的是64位框,请确保清除Visual Studio中的首选32位"选项.我的计算机上有16GB的RAM,该程序立即消耗了每个可用字节.
If you're on a 64-bit box, make sure to clear the "Prefer 32-bit" option in Visual Studio. I have 16GB of RAM on my computer, and this program immediately consumes every available byte.
推荐答案
您似乎误解了TPL Dataflow的工作原理.
You seem to misunderstand how TPL Dataflow works.
BoundedCapacity
限制了您可以发布到区块中的项目数量.在您的情况下,这意味着将 char
放入 TransformManyBlock
中,并将 string
放入 ActionBlock
中.
BoundedCapacity
limits the amount of items you can post into a block. In your case that means a single char
into the TransformManyBlock
and single string
into the ActionBlock
.
因此,您将一个项目发布到 TransformManyBlock
,然后返回 1024 * 1024
字符串,并尝试将其传递给 ActionBlock
一次只能接受一个.其余的字符串仅位于 TransformManyBlock
的输出队列中.
So you post a single item to the TransformManyBlock
which then returns 1024*1024
strings and tries to pass them on to the ActionBlock
which will only accept a single one at a time. The rest of the strings will just sit there in the TransformManyBlock
's output queue.
您可能想要做的是创建一个块,并在达到其容量时通过等待(同步或其他方式)以流方式将项目发布到其中:
What you probably want to do is create a single block and post items into it in a streaming fashion by waiting (synchronously or otherwise) when it's capacity is reached:
private static void Main()
{
MainAsync().Wait();
}
private static async Task MainAsync()
{
var block = new ActionBlock<string>(async item =>
{
Console.WriteLine(item.Substring(0, 10));
await Task.Delay(1000);
}, new ExecutionDataflowBlockOptions { BoundedCapacity = 1 });
foreach (var item in GetSequence('A'))
{
await block.SendAsync(item);
}
block.Complete();
await block.Completion;
}
这篇关于TPL数据流块消耗了所有可用的内存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!