Parallel.For System.OutOfMemoryException [英] Parallel.For System.OutOfMemoryException

查看:120
本文介绍了Parallel.For System.OutOfMemoryException的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一个相当简单的程序,用于创建备份.我正在尝试对其进行并行化,但是在AggregateException中却得到了OutOfMemoryException.一些源文件夹非常大,该程序启动后约40分钟不会崩溃.我不知道从哪里开始寻找内容,因此以下代码几乎是所有代码(目录结构和Exception日志记录代码)的完全转储.关于从哪里开始寻找的任何建议?

We have a fairly simple program that's used for creating backups. I'm attempting to parallelize it but am getting an OutOfMemoryException within an AggregateException. Some of the source folders are quite large, and the program doesn't crash for about 40 minutes after it starts. I don't know where to start looking so the below code is a near exact dump of all code the code sans directory structure and Exception logging code. Any advice as to where to start looking?

using System;
using System.Diagnostics;
using System.IO;
using System.Threading.Tasks;

namespace SelfBackup
{
class Program
{

static readonly string[] saSrc = { 
    "\\src\\dir1\\",
    //...
    "\\src\\dirN\\", //this folder is over 6 GB
};
static readonly string[] saDest = { 
    "\\dest\\dir1\\",
    //...
    "\\dest\\dirN\\",
};

static void Main(string[] args)
{
Parallel.For(0, saDest.Length, i =>
{
    try
    {
        if (Directory.Exists(sDest))
        {
            //Delete directory first so old stuff gets cleaned up
            Directory.Delete(sDest, true);
        }

        //recursive function 
        clsCopyDirectory.copyDirectory(saSrc[i], sDest);
    }
    catch (Exception e)
    {
        //standard error logging
        CL.EmailError();
    }
});
}
}

///////////////////////////////////////
using System.IO;
using System.Threading.Tasks;

namespace SelfBackup
{
static class clsCopyDirectory
{
    static public void copyDirectory(string Src, string Dst)
    {
        Directory.CreateDirectory(Dst);

        /* Copy all the files in the folder
           If and when .NET 4.0 is installed, change 
           Directory.GetFiles to Directory.Enumerate files for 
           slightly better performance.*/
        Parallel.ForEach<string>(Directory.GetFiles(Src), file =>
        {
            /* An exception thrown here may be arbitrarily deep into 
               this recursive function there's also a good chance that
               if one copy fails here, so too will other files in the 
               same directory, so we don't want to spam out hundreds of 
               error e-mails but we don't want to abort all together. 
               Instead, the best solution is probably to throw back up 
               to the original caller of copy directory an move on to 
               the next Src/Dst pair by not catching any possible
               exception here.*/
            File.Copy(file, //src
                      Path.Combine(Dst, Path.GetFileName(file)), //dest
                      true);//bool overwrite
        });

        //Call this function again for every directory in the folder.
        Parallel.ForEach(Directory.GetDirectories(Src), dir =>
        {
            copyDirectory(dir, Path.Combine(Dst, Path.GetFileName(dir)));
        });
    }
}

线程"调试窗口在发生异常时显示417个工作线程.

The Threads debug window shows 417 Worker threads at the time of the exception.

是从一台服务器复制到另一台服务器.我现在正在尝试使用上一个Paralell.ForEach更改为常规foreach的代码.

The copying is from one server to another. I'm now trying to run the code with the last Paralell.ForEach changed to a regular foreach.

推荐答案

在这里进行一些猜测,因为我尚未收到评论对您问题的反馈.

Making a few guesses here as I haven't yet had feedback from the comment to your question.

我猜想这里正在发生大量的工作线程,因为操作(操作是在并行foreach上执行的工作单元)花费的时间超过了指定的时间,因此底层ThreadPool正在增长线程数.当ThreadPool遵循增长池的算法时,将发生这种情况,以便新任务不会被现有的长期运行的任务阻止,例如如果我当前的所有线程都忙了半秒钟,那么我将开始向池中添加更多线程.但是,如果所有任务都长时间运行,并且添加的新任务会使现有任务运行更长的时间,您将遇到麻烦.这就是为什么您可能会看到大量工作线程的原因-可能是由于磁盘抖动或网络IO速度慢(如果涉及网络驱动器).

I am guessing that the large amount of worker threads is happening here as actions (an action being the unit of work carried out on the parallel foreach) are taking longer than a specified amount of time, so the underlying ThreadPool is growing the number of threads. This will happen as the ThreadPool follows an algorithm of growing the pool so that new tasks are not blocked by existing long running tasks e.g. if all my current threads have been busy for half a second, I'll start adding more threads to the pool. However, you are going to get into trouble if all tasks are long-running and new tasks that you add are going to make existing tasks run even longer. This is why you are probably seeing a large number of worker threads - possibly because of disk thrashing or slow network IO (if networked drives are involved).

我还猜测文件是从一个磁盘复制到另一个磁盘,还是从一个位置复制到同一磁盘上的另一个位置.在这种情况下,向问题中添加线程不会有太大帮助.源磁盘和目标磁盘只有一组磁头,因此试图使它们同时执行多项操作可能会实际上减慢速度:

I am also guessing that files are being copied from one disk to another, or they are being copied from one location to another on the same disk. In this case, adding threads to the problem is not going to help out much. The source and destination disks only have one set of heads, so trying to make them do multiple things at once is likely to actually slow things down:

  • 磁盘头会在各处乱七八糟.
  • 您的磁盘\ OS缓存可能经常失效.

对于并行化来说,这可能不是一个大问题.

This may not be a great problem for parallelization.

更新

为回答您的评论,如果您要在较小的数据集上使用多个线程来加快速度,那么您可以尝试降低并行foreach中使用的最大线程数,例如

In answer to your comment, if you are getting a speed-up using multiple threads on smaller datasets, then you could experiment with lowering the maximum number of threads used in your parallel foreach, e.g.

ParallelOptions options = new ParallelOptions { MaxDegreeOfParallelism = 2 };

Parallel.ForEach(Directory.GetFiles(Src), options, file =>
{
    //Do stuff
});

但是请记住,在通常情况下,磁盘抖动可能会抵消并行化带来的任何好处.试一试,并衡量您的结果.

But please do bear in mind that disk thrashing may negate any benefits from parallelization in the general case. Play about with it and measure your results.

这篇关于Parallel.For System.OutOfMemoryException的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆