模拟给出了正常法则的不同结果,循环Vs平行于 [英] Simulation gives different result with normal for loop Vs Parallel For

查看:77
本文介绍了模拟给出了正常法则的不同结果,循环Vs平行于的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我尝试使用普通的for循环(这是正确的结果)与Parallel For进行比较时,我为一个简单的模拟示例中的不同结果感到惊讶.请帮助我找到可能的原因.我观察到并行执行与普通执行相比是如此之快.

I am bit surprised with different results for one of my simple simulation sample when I tried with normal for loop ( which is correct result) Vs Parallel For. Please help me to find what could be the reason. I observed that Parallel execution is so fast compare to normal.

using System;
using System.Collections.Generic;
using System.Threading.Tasks;

namespace Simulation
{
    class Program
    {

    static void Main(string[] args)
    {
       ParalelSimulation(); // result is .757056
       NormalSimulation();  // result is .508021 which is correct
        Console.ReadLine();
    }

    static void ParalelSimulation()
    {
        DateTime startTime = DateTime.Now;

        int trails = 1000000;
        int numberofpeople = 23;
        Random rnd = new Random();
        int matches = 0;

        Parallel.For(0, trails, i =>
            {
                var taken = new List<int>();
                for (int k = 0; k < numberofpeople; k++)
                {
                   var day = rnd.Next(1, 365);
                    if (taken.Contains(day))
                    {
                        matches += 1;
                        break;
                    }
                    taken.Add(day);
                }
            }
        );
        Console.WriteLine((Convert.ToDouble(matches) / trails).ToString());
        TimeSpan ts = DateTime.Now.Subtract(startTime);
        Console.WriteLine("Paralel Time Elapsed: {0} Seconds:MilliSeconds", ts.Seconds + ":" + ts.Milliseconds);
    }
    static void NormalSimulation()
    {
        DateTime startTime = DateTime.Now;

        int trails = 1000000;
        int numberofpeople = 23;
        Random rnd = new Random();
        int matches = 0;

        for (int j = 0; j < trails; j++)
        {
            var taken = new List<int>();
            for (int i = 0; i < numberofpeople; i++)
            {
               var day = rnd.Next(1, 365);
                if (taken.Contains(day))
                {
                    matches += 1;
                    break;
                }
                taken.Add(day);
            }
        }
        Console.WriteLine((Convert.ToDouble(matches) / trails).ToString());
        TimeSpan ts = DateTime.Now.Subtract(startTime);
        Console.WriteLine(" Time Elapsed: {0} Seconds:MilliSeconds", ts.Seconds + ":" + ts.Milliseconds);
    }
}

}

预先感谢

推荐答案

代码包含数据竞争有关matches的更新.如果两个线程同时执行此操作,则两个线程都可以读取相同的值(例如10),然后都将其递增(到11)并写回新值.结果,注册的匹配将更少(在我的示例中为11而不是12).解决方案是对该变量使用 System.Threading.Interlocked .

The code contains a data race on the update of matches. If two threads do it simultaneously, both can read the same value of it (say, 10), then both increment it (to 11) and write the new value back. As a result, there will be less registered matches (in my example, 11 instead of 12). The solution is to use System.Threading.Interlocked for this variable.

我看到的其他问题:
-您的串行循环包含j等于trails的迭代,而并行循环则不(结束索引在 class Random 可能不是线程安全的.

Other issues I see:
- your serial loop includes an iteration for j equal to trails while the parallel loop does not (the end index is exclusive in Parallel.For);
- class Random might be not thread safe.

更新:我想您无法用Drew Marsh的代码获得想要的结果,因为它没有提供足够的随机性. 每1M实验均以完全相同的随机数开始,因为您使用默认种子启动了Random的所有本地实例.本质上,您将同一实验重复1M次,因此结果仍然偏斜.要解决此问题,您需要每次为每个随机数分配一个新值.更新:我在这里并不完全正确,因为默认初始化使用系统时钟作为种子.但是,MSDN警告

Update: I think you do not get the result you want with Drew Marsh's code because it does not provide enough randomization. Each of 1M experiments starts with exactly the same random number, because you initiate all local instances of Random with the default seed. Essentially, you repeat the same experiment 1M times, so the result is still skewed. To fix that, you need to seed each randomizer with a new value each time. Update: I was not totally correct here, as the default initialization uses system clock for the seed; however, MSDN warns that

因为时钟具有有限的分辨率,所以使用无参数构造函数连续创建不同的Random对象会创建随机数生成器,该生成器会生成相同的随机数序列.

because the clock has finite resolution, using the parameterless constructor to create different Random objects in close succession creates random number generators that produce identical sequences of random numbers.

因此,这仍然可能是随机性不足的原因,并且使用显式种子可能会获得更好的结果.例如,使用外循环迭代次数进行初始化为我提供了一个很好的答案:

So this still might be the reason of insufficient randomization, and with explicit seeds you might get better results. For example, initializing with the number of the outer loop iteration provided a good answer for me:

Parallel.For(0, trails + 1, j =>
{
    Random rnd = new Random(j); // initialized with different seed each time
    /* ... */          
});

但是,我注意到将Random的初始化移入循环之后,所有的提速都丢失了(在我的Intel Core i5笔记本电脑上).由于我不是C#专家,所以我不知道为什么.但是我想Random类可能会在访问同步的情况下被所有实例共享一些数据.

However, I noticed that after the initialization of Random was moved into the loop, all the speedup was lost (on my Intel Core i5 laptop). Since I am not a C# expert, I do not know why; but I suppose that class Random might have some data shared by all instances with synchronization of access.

更新2:使用 ThreadLocal 进行保存每个线程一个Random实例,我得到了很好的准确性和合理的加速:

Update 2: With the use of ThreadLocal for keeping one instance of Random per thread, I've got both good accuracy and reasonable speedup:

ThreadLocal<Random> ThreadRnd = new ThreadLocal<Random>(() =>
{
    return new Random(Thread.CurrentThread.GetHashCode());
});
Parallel.For(0, trails + 1, j =>
{
    Random rnd = ThreadRnd.Value;
    /* ... */          
});

注意如何使用当前正在运行的Thread实例的哈希码初始化每个线程的随机化器.

Notice how the per-thread randomizers are initialized with the hash code for the currently running instance of Thread.

这篇关于模拟给出了正常法则的不同结果,循环Vs平行于的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆