并行IO绑定(网络)ForEach循环 [英] Parallelizing IO Bound (Network) ForEach Loop
问题描述
根据选择的选项,我有几种方法可以将整个目录上传到应用程序中的Amazon S3.当前,其中一个选项将并行执行多个目录的上载.我不确定这是否是个好主意,因为在某些情况下会加快上传速度,而在其他情况下会减慢上传速度.似乎有许多小目录时,速度会加快,但如果批处理中有大目录,则速度会降低.我正在使用下面看到的并行ForEach循环,并利用AWS API的TransferUtility.UploadDirectoryAsync()
方法,例如:
I have a few different ways of upload entire directories to Amazon S3 within my application depending on what options are selected. Currently one of the options will perform an upload of multiple directories in parallel. I'm not sure if this is a good idea as in some cases it sped up the upload and other cases it slowed it down. The speed up appears to be when there are a bunch of small directories, but it slows down if there are large directories in the batch. I'm using the parallel ForEach loop seen below and utilizing the AWS API's TransferUtility.UploadDirectoryAsync()
method as such:
Parallel.ForEach(dirs,myParallelOptions,
async dir => { await MyUploadMethodAsync(dir) };
TransferUtility.UploadDirectoryAsync()
方法在MyUploadMethodAsync()
之内的位置. TransferUtility
的上载方法都对单个文件并行执行零件的上载(如果大小足够大),因此对目录进行并行上载也可能会过大.显然,我们仍然受限于可用带宽的数量,因此这可能是浪费,我只应在UploadDirectoryAsync()
方法中使用常规的foreach循环.谁能提供一些有关并行化是否不好的见解?
Where the TransferUtility.UploadDirectoryAsync()
method is within MyUploadMethodAsync()
. The TransferUtility
's upload methods all perform parallel uploads of parts a single file (if the size is big enough to do so), so performing a parallel upload of the directory as well may be overkill. Obviously we are still limited to the amount of bandwidth available so this might be a waste and I just should just use a regular foreach loop with the UploadDirectoryAsync()
method. Can anyone provide some insight on if this is bad case for parallelization?
推荐答案
您是否真的对此进行过测试?您使用它的方式,由于async
lambda,Parallel.ForEach
可能会在MyUploadMethodAsync
中的任何一个完成之前返回好:
Did you actually test this? The way you're using it, Parallel.ForEach
may return well before any of MyUploadMethodAsync
is completed, because of the async
lambda:
Parallel.ForEach(dirs,myParallelOptions,
async dir => { await MyUploadMethodAsync(dir) };
Parallel.ForEach
适用于受CPU限制的任务.对于受IO约束的任务,您可能正在寻找类似以下的内容:
Parallel.ForEach
is suited for CPU-bound tasks. For IO-bound tasks, you are probably looking for something like this:
var tasks = dirs.Select(dir => MyUploadMethodAsync(dir));
await Task.WhenAll(tasks);
// or Task.WaitAll(tasks) if you need a blocking wait
这篇关于并行IO绑定(网络)ForEach循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!