Parallel.Foreach 产生太多线程 [英] Parallel.Foreach spawning way too many threads

查看:20
本文介绍了Parallel.Foreach 产生太多线程的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

尽管我将在这里讨论的代码是用 F# 编写的,但它基于 .NET 4 框架,并不特别依赖于 F# 的任何特殊性(至少看起来如此!).

Although the code about which I will talk here I wrote in F#, it is based on the .NET 4 framework, not specifically depending on any particularity of F# (at least it seems so!).

我的磁盘上有一些数据,我应该从网络更新,将最新版本保存到磁盘:

I have some pieces of data on my disk that I should update from the network, saving the latest version to the disk:

type MyData =
    { field1 : int;
      field2 : float }

type MyDataGroup =
    { Data : MyData[];
      Id : int }

// load : int -> MyDataGroup
let load dataId =
    let data = ... // reads from disk
    { Data = data;
      Id = dataId }

// update : MyDataGroup -> MyDataGroup
let update dg =
    let newData = ... // reads from the network and process
                      // newData : MyData[]

    { dg with Data = dg.Data
                     |> Seq.ofArray
                     |> Seq.append newData
                     |> processDataSomehow
                     |> Seq.toArray }

// save : MyDataGroup -> unit
let save dg = ... // writes to the disk

let loadAndSaveAndUpdate = load >> update >> save

问题是要loadAndSaveAndUpdate我的所有数据,我必须多次执行该函数:

The problem is that to loadAndSaveAndUpdate all my data, I would have to execute the function many times:

{1 .. 5000} |> loadAndSaveAndUpdate

每一步都会做

  • 一些磁盘 IO,
  • 一些数据处理,
  • 一些网络 IO(可能有很多延迟),
  • 更多数据处理,
  • 和一些磁盘 IO.

在某种程度上并行完成这件事不是很好吗?不幸的是,我的阅读和解析功能都不是异步工作流就绪"的.

Wouldn't it be nice to have this done in parallel, to some degree? Unfortunately, none of my reading and parsing functions are "async-workflows-ready".

我做的第一件事是设置一个 Task[] 并启动它们:

The first thing I've done was to set up a Task[] and start them all:

let createTask id = new Task(fun _ -> loadAndUpdateAndSave id)
let tasks = {1 .. 5000}
            |> Seq.map createTask
            |> Seq.toArray

tasks |> Array.iter (fun x -> x.Start())
Task.WaitAll(tasks)

然后我按 CTRL+ESC 只是为了查看它使用了多少线程.15, 17, ..., 35, ..., 170, ... 直到杀死应用程序!出了点问题.

Then I hit CTRL+ESC just to see how many threads it was using. 15, 17, ..., 35, ..., 170, ... until killed the application! Something was going wrong.

我做了几乎同样的事情,但使用了 Parallel.ForEach(...),结果是一样的:很多很多很多线程.

I did almost the same thing but using Parallel.ForEach(...) and the results were the same: lots and lots and lots of threads.

然后我决定只启动 n 个线程,Task.WaitAll(of them),然后其他 n,直到没有更多任务可用.

Then I decided to start only n threads, Task.WaitAll(of them), then other n, until there were no more tasks available.

这是可行的,但问题是当它完成处理,比如 n-1 个任务时,它会等待,等待,等待该死的最后一个由于大量而坚持阻塞的任务网络延迟.这不好!

This works, but the problem is that when it has finished processing, say, n-1 tasks, it will wait, wait, wait for the damn last Task that insist on blocking due to lots of network latency. This is not good!

那么,您将如何解决这个问题?我很乐意查看不同的解决方案,包括异步工作流(以及在这种情况下如何调整我的非异步功能)、并行扩展、奇怪的并行模式等.

So, how would you attack this problem? I'd appreciate to view different solutions, involving either Async Workflows (and in this case how to adapt my non-async functions), Parallel Extensions, weird parallel patterns, etc.

谢谢.

推荐答案

您确定您的个人任务按时完成了吗?我相信 Parallel.ForEachTask 类都已经使用了 .NET 线程池.任务通常应该是短期工作项,在这种情况下,线程池只会产生少量实际线程,但是如果您的任务没有取得进展并且有其他任务排队,那么使用的线程数将稳步增加到最大值(默认为 250/processor 在 .NET 2.0 SP1 中,但在不同版本的框架下有所不同).还值得注意的是(至少在 .NET 2.0 SP1 中)新线程创建被限制为每秒 2 个新线程,因此达到您看到的线程数表明任务没有在短时间内完成时间(因此将责任归咎于 Parallel.ForEach 可能并不完全准确).

Are you sure that your individual tasks are completing in a timely manner? I believe that both Parallel.ForEach and the Task class already use the .NET threadpool. Tasks should generally be short-lived work items, in which case the threadpool will only spawn a small number of actual threads, but if your tasks are not making progress and there are other tasks queued then the number of threads used will steadily increase up to the maximum (which by default is 250/processor in .NET 2.0 SP1, but is different under different versions of the framework). It's also worth noting that (at least in .NET 2.0 SP1) new thread creation is throttled to 2 new threads per second, so getting up to the number of threads you're seeing indicates that the tasks are not completing in a short amount of time (so it may not be completely accurate to pin the blame on Parallel.ForEach).

我认为 Brian 建议使用 async 工作流是一个很好的建议,特别是如果长期任务的来源是 IO,因为 async 将返回您的线程到线程池,直到 IO 完成.另一种选择是简单地接受您的任务没有快速完成并允许产生许多线程(可以通过使用 System.Threading.ThreadPool.SetMaxThreads 在某种程度上控制) - 取决于您的在这种情况下,您使用大量线程可能没什么大不了的.

I think that Brian's suggestion to use async workflows is a good one, particularly if the source of the long-lived tasks is IO, since async will return your threads to the threadpool until the IO completes. Another option is to simply accept that your tasks aren't completing quickly and allow the spawning of many threads (which can be controlled to some extent by using System.Threading.ThreadPool.SetMaxThreads) - depending on your situation it may not be a big deal that you're using a lot of threads.

这篇关于Parallel.Foreach 产生太多线程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆