Parallel.For循环答案不同于串行For-Next [英] Parallel.For loop answer is different than serial For-Next

查看:60
本文介绍了Parallel.For循环答案不同于串行For-Next的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个程序来处理从输入文件中获取的字符串,然后将它们写入输出文件。文件很大,因此运行需要很长时间,而且它是计算绑定的,而不是IO绑定的。这个过程的核心是For-Next循环,它已经成熟,可以重写为Parallel.For以利用闲置的处理器电源。



我' d从来没有写过并行程序,所以我阅读了我能找到并尝试的内容,并取得了成功。我编写了一个程序版本,可以作为串行For-Next或Parallel.For(带有Lambda表达式)运行,只需要进行一些注释更改即可从一个切换到另一个。



作为一个系列,它运行完美。作为并行,它会错过一些输出并且不一致(意味着输出文件大小不正确,并且对于相同的输入在运行之间变化)。这对我天真的大脑来说听起来像一个线程安全问题,虽然我可能会发出错误的曲调。



写的唯一共享数据是字符串的输出数组。每个循环只写入其中的一个位置,由For中的计数器定义,所以我认为这将是线程安全的。写的所有其他变量都是在Lambda中定义的,我认为应该是线程安全的。



我做了很多实验,包括(包括)但不限于):



*在写入输出数组时使用SyncLock。

*从两个小函数中提取代码调用。

*制作读取的所有数据的线程局部副本,而不仅仅是写入。

*在制作线程本地数据时使用Synclock。



我显然在某处犯了错误。代码并不困难,但发布时间太长,所以如果我能得到一些关于尝试或阅读的内容的一般性建议,或者从哪里开始寻找,我会感激不尽。

I have a program that manipulates strings taken from an input file, then writes them to an output file. The files are large, so it takes a long time to run, and it's calculation-bound, not IO-bound. The heart of the process is a For-Next loop that's ripe to be rewritten as a Parallel.For in order to take advantage of the idle processor power.

I'd never written a parallel program, so I read what I could find and experimented, with success. I've written a version of the program that can be run either as a serial For-Next or a Parallel.For (with Lambda expression) with only a couple of comment changes needed to switch from one to the other.

As a serial, it runs perfectly. As a parallel, it misses some of the output and does so inconsistently (meaning the output file sizes are not right and vary from run to run for the same input). This sounds like a thread safety issue to my naive brain, though I could be humming the wrong tune.

The only shared data that is written is an output array of strings. Each loop only writes to one position in it, defined by the counter in the For, so I had thought that would be thread-safe. All other variables that are written are defined inside the Lambda and should be thread-safe, I think.

I've done a lot of experimenting with it, including (but not limited to):

* using a SyncLock when writing to the output array.
* pulling the code in from two small function calls.
* making thread-local copies of all data that is read, not just written.
* using a Synclock when making the thread-local data.

I've obviously blundered somewhere. The code isn't difficult, but it's too long to post, so if I can get some general recommendations on what to try or read, or where to start looking, I'd be thankful.

推荐答案

如果没有显示一些代码示例,这个问题就没有多大意义了。



但是,你应该明白并行处理可以给出完全相同的结果仅在某些条件下进行顺序处理的那些。对于并行循环,粗略地说,这意味着每次迭代的结果不应该影响其他迭代的结果。一个简单的例子是使用值填充以前创建的数组(不是集合!),当没有值取决于数组其他位置的值时。



相关问题在总括词竞争条件下是已知的,我倾向于更准确地表达为对执行顺序的不正确依赖。这方面与共享资源的锁定问题完全无关;那就是:你可能已经锁定并且还有竞争条件。



(对不起,编辑CodeProject中的链接搞砸了。请参阅维基百科文章http ://en.wikipedia.org/wiki/Race_condition。)



-SA
The question makes not much sense without showing some code sample.

However, you should understand that parallel processing can give results fully equivalent to the ones for sequential processing only under certain conditions. For the parallel loops, roughly speaking, it means that the result of each iteration should not affect the results of other iterations. One simple example is filling in a previously created array (not collection!) with values, when none of the values depends on the values in other positions of the array.

The related problems are known under the umbrella term "race condition", which I tend to express more precisely as "incorrect dependency on the order of execution". This aspect is totally unrelated to the problem of the locking of the shared resource; that is: you may have locking and yet have the race condition.

(Sorry, editing of links in CodeProject is screwed up somehow. Please see the Wikipedia article "http://en.wikipedia.org/wiki/Race_condition".)

—SA


好的,所以我想通了出了什么问题,但我不知道为什么。这部分中间的For-Next循环没有正确地创建 bitstr ,因此正在创建非唯一条目,并且这些条目在另一个中的后续操作中被剥离,外循环。 outG()是正确的,但构建字符串的这种连接方法不起作用。

OK, so I figured out what was wrong, but I don't know why. The For-Next loop in the middle of this part was not creating bitstr correctly, so non-unique entries were being made, and those were stripped in a subsequent operation in another, outer loop. outG() was correct, but this concatenation method to build the string did not work.
' Convert the output vector back to a character string.
      cc_str = Nothing
      ' Start with the bit string.
      bitstr = bobinbitsl
      For jl = 0 To en0
        bitstr &= CStr(outG(jl, enl))
      Next
      bitstr &= eobinbitsl
      bitcount = bitstr.Length
      If bitcount Mod 6 <> 0 Then Stop ' got a problem



我解决问题的方法是精简创建 bitstr 并在创建 outG 的同时创建它(用于某些分析我没有包含)。



我仍​​然对为什么循环失败感到困惑。所有变量都是线程本地的,代码方法很笨拙但不是非传统的。我甚至在循环中尝试了一个SyncLock,这也无济于事。我将用细齿梳子完成这个过程并清理干净。我只是希望我不会在这个过程中创建另一个奇怪的错误!


My solution to the problem was to streamline the creation of bitstr and make it at the same time it's creating outG (used for some analysis I didn't include).

I'm still mystified about why that loop fails. All of the variables are thread-local, and the code methods are clumsy but not unconventional. I even tried a SyncLock around the loop, and that didn't help, either. I'll go through this process with a fine-tooth comb and clean it up. I just hope I don't create another weird bug in the process!


这篇关于Parallel.For循环答案不同于串行For-Next的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆