将单线程应用程序迁移到多线程,并行执行,蒙特卡洛仿真 [英] Migrate a single threaded app to multi-threaded, parallel execution, monte carlo simulation

查看:89
本文介绍了将单线程应用程序迁移到多线程,并行执行,蒙特卡洛仿真的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的任务是进行现有的单线程蒙特卡洛仿真优化.这是一个ac#控制台应用程序,没有数据库访问权限,它只能从csv文件加载一次数据并将其写出,因此它几乎与CPU绑定,也仅使用了约50mb的内存. /p>

我已经通过Jetbrains dotTrace分析器运行它.在总执行时间中,大约30%会生成统一随机数,而24%会将统一随机数转换为正态分布的随机数.

基本的算法是很多嵌套的for循环,其中随机数调用和矩阵乘法位于中心,每次迭代都返回一个double,并将其添加到结果列表中,该列表是周期性的如果可接受,则对程序进行排序并测试某些收敛标准(在总迭代次数的5%处检查点),程序会跳出循环并写入结果,否则将继续进行下去.

我希望开发人员可以参与其中:

  • 我应该使用 new Thread v ThreadPool
  • 我应该看看 Microsoft Parallels Extension库
  • 我应该看看 AForge.Net Parallel.For http: //code.google.com/p/aforge/其他任何库?

上面的一些指向教程的链接将是最受欢迎的,因为我从未编写过任何并行或多线程代码.

    生成整体正态分布随机数,然后使用它们的最佳策略.应用程序永远不会在这种状态下使用统一的随机数,而是始终将其转换为正态分布,然后使用.
  • 良好的快速库(并行?),用于随机数生成
  • 与此同时,内存注意事项,我还需要多少呢?

当前应用需要2个小时进行500,000次迭代,业务需要将其扩展到3,000,000次迭代,并且每天需要多次调用,因此需要进行大量优化.

特别希望听到使用 Microsoft Parallels Extension AForge.Net Parallel

的人

此程序需要相当快地投入生产,因此 .net 4 beta版已经退出,即使我知道它并入了并发库,我们也可以考虑在迁移到.net 4以后进行迁移.已发布.目前服务器具有.Net 2,我已经提交了我的开发人员拥有的.net 3.5 SP1升级供审核.

谢谢

更新

我刚刚尝试了Parallel.For的实现,但是它带来了一些奇怪的结果. 单线程:

IRandomGenerator rnd = new MersenneTwister();
IDistribution dist = new DiscreteNormalDistribution(discreteNormalDistributionSize);
List<double> results = new List<double>();

for (int i = 0; i < CHECKPOINTS; i++)
{
 results.AddRange(Oblist.Simulate(rnd, dist, n));
}

收件人:

Parallel.For(0, CHECKPOINTS, i =>
        {
           results.AddRange(Oblist.Simulate(rnd, dist, n));
        });

在模拟中,有许多对rnd.nextUniform()的调用,我想我得到了很多相同的值,因为现在这是并行的,是否有可能发生?

也许List AddRange调用不是线程安全的问题?我看到了

System.Threading.Collections.BlockingCollection可能值得使用,但是它只有一个Add方法而没有AddRange,因此我不得不查看那里的结果并以线程安全的方式添加.来自使用Parallel的人的任何见解.我暂时切换到 System.Random 进行调用,因为在使用Mersenne Twister实现调用nextUniform时遇到异常,某些线程可能不是线程安全的正在获得索引超出范围 ....

解决方案

首先,您需要了解为什么您认为使用多个线程是一种优化-实际上并非如此.如果您有多个处理器,使用多个线程将使您的工作负载更快地完成,仅 ,然后最多使您的CPU速度快许多倍(这称为 speed-up >).传统意义上的工作不是优化"的(即工作量没有减少-实际上,使用多线程,由于线程开销,工作量通常会增加).

因此,在设计应用程序时,您必须找到可以并行或重叠方式完成的工作.可以并行生成随机数(通过在不同的CPU上运行多个RNG),但是随着您获得不同的随机数,这也会改变结果.另一种选择是在一个CPU上生成随机数,而在其他CPU上生成其他所有内容. RNG仍将按顺序运行,并且仍承担30%的负载,因此最高可提速3.

因此,如果要进行此并行化,则最终会有3个线程:线程1运行RNG,线程2产生正态分布,线程3执行其余的模拟.

对于这种架构,生产者-消费者架构是最合适的.每个线程将从队列中读取其输入,并将其输出生成另一个队列.每个队列都应该阻塞,因此如果RNG线程落后,则规范化线程将自动阻塞,直到有新的随机数可用为止.为了提高效率,我会跨线程传递100个(或更大)数组中的随机数,以避免每个随机数上的同步.

对于这种方法,您不需要任何高级线程.只需使用常规线程类,没有池,没有库.您唯一需要的(不幸的是)不在标准库中是阻塞的Queue类(System.Collections中的Queue类不好). Codeproject 提供了一个外观合理的实现;可能还有其他人.

I've been tasked with taking an existing single threaded monte carlo simulation and optimising it. This is a c# console app, no db access it loads data once from a csv file and writes it out at the end, so it's pretty much just CPU bound, also only uses about 50mb of memory.

I've run it through Jetbrains dotTrace profiler. Of total execution time about 30% is generating uniform random numbers, 24% translating uniform random numbers to normally distributed random numbers.

The basic algorithm is a whole lot of nested for loops, with random number calls and matrix multiplication at the centre, each iteration returns a double which is added to a results list, this list is periodically sorted and tested for some convergence criteria (at check points every 5% of total iteration count) if acceptable the program breaks out of the loops and writes the results, else it proceeds to the end.

I'd like developers to weigh in on:

  • should I use new Thread v ThreadPool
  • should I look at the Microsoft Parallels Extension library
  • should I look at AForge.Net Parallel.For, http://code.google.com/p/aforge/ any other libraries?

Some links to tutorials on the above would be most welcome as I've never written any parallel or multi-threaded code.

  • best strategies for generating en-mass normally distributed random numbers, and then consuming these. Uniform random numbers are never used in this state by the app, they are always translated to normally distributed and then consumed.
  • good fast libraries (parallel?) for random number generation
  • memory considerations as I take this parallel, how much extra will I require.

Current app takes 2 hours for 500,000 iterations, business needs this to scale to 3,000,000 iterations and be called mulitple times a day so need some heavy optimisation.

Particulary would like to hear from people who have used Microsoft Parallels Extension or AForge.Net Parallel

This needs to be productionised fairly quickly so .net 4 beta is out even though I know it has concurrency libraries baked in, we can look at migrating to .net 4 later down the track once it's released. For the moment the server has .Net 2, I've submitted for review an upgrade to .net 3.5 SP1 which my dev box has.

Thanks

Update

I've just tried the Parallel.For implementation but it comes up with some weird results. Single threaded:

IRandomGenerator rnd = new MersenneTwister();
IDistribution dist = new DiscreteNormalDistribution(discreteNormalDistributionSize);
List<double> results = new List<double>();

for (int i = 0; i < CHECKPOINTS; i++)
{
 results.AddRange(Oblist.Simulate(rnd, dist, n));
}

To:

Parallel.For(0, CHECKPOINTS, i =>
        {
           results.AddRange(Oblist.Simulate(rnd, dist, n));
        });

Inside simulate there are many calls to rnd.nextUniform(), I think I am getting many values that are the same, is this likely to happen because this is now parallel?

Also maybe issues with the List AddRange call not being thread safe? I see this

System.Threading.Collections.BlockingCollection might be worth using, but it only has an Add method no AddRange so I'd have to look over there results and add in a thread safe manner. Any insight from someone who has used Parallel.For much appreciated. I switched to the System.Random for my calls temporarily as I was getting an exception when calling nextUniform with my Mersenne Twister implementation, perhaps it wasn't thread safe a certain array was getting an index out of bounds....

解决方案

First you need to understand why you think that using multiple threads is an optimization - when it is, in fact, not. Using multiple threads will make your workload complete faster only if you have multiple processors, and then at most as many times faster as you have CPUs available (this is called the speed-up). The work is not "optimized" in the traditional sense of the word (i.e. the amount of work isn't reduced - in fact, with multithreading, the total amount of work typically grows because of the threading overhead).

So in designing your application, you have to find pieces of work that can be done in a parallel or overlapping fashion. It may be possible to generate random numbers in parallel (by having multiple RNGs run on different CPUs), but that would also change the results, as you get different random numbers. Another option is have generation of the random numbers on one CPU, and everything else on different CPUs. This can give you a maximum speedup of 3, as the RNG will still run sequentially, and still take 30% of the load.

So if you go for this parallelization, you end up with 3 threads: thread 1 runs the RNG, thread 2 produces normal distribution, and thread 3 does the rest of the simulation.

For this architecture, a producer-consumer architecture is most appropriate. Each thread will read its input from a queue, and produce its output into another queue. Each queue should be blocking, so if the RNG thread falls behind, the normalization thread will automatically block until new random numbers are available. For efficiency, I would pass the random numbers in array of, say, 100 (or larger) across threads, to avoid synchronizations on every random number.

For this approach, you don't need any advanced threading. Just use regular thread class, no pool, no library. The only thing that you need that is (unfortunately) not in the standard library is a blocking Queue class (the Queue class in System.Collections is no good). Codeproject provides a reasonably-looking implementation of one; there are probably others.

这篇关于将单线程应用程序迁移到多线程,并行执行,蒙特卡洛仿真的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆