犰狳中的并行化 [英] Parallelisation in Armadillo

查看:252
本文介绍了犰狳中的并行化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Armadillo C ++线性代数库文档说明了在C ++中开发库的原因之一要成为通过现代C ++编译器中的OpenMP实现并行化的简易性,但是犰狳代码不会使用OpenMP。我如何获得与Armadillo并行化的好处?这是通过使用高速LAPACK和BLAS替换实现的吗?我的平台是Linux,Intel处理器,但我怀疑这个问题有一个通用的答案。

解决方案

实际上是通过使用高速LAPACK和BLAS替换来实现的。在Ubuntu 12.04上,我使用软件包管理器安装了OpenBLAS,并从源代码构建了Armadillo库。在 examples 文件夹中创建和运行的示例,我可以使用 OPENBLAS_NUM_THREADS 环境变量控制核心数。 / p>

我创建了一个小项目 openblas-benchmark 它测量Armadillo在计算各种大小矩阵的矩阵乘积C = AxB时的性能增加,但我只能在2核机器上测试它。



性能图显示了大于512x512的矩阵的执行时间减少了近50%。注意,两个轴都是对数的; y轴上的每个网格线表示执行时间加倍。


The Armadillo C++ linear algebra library documentation states one of the reasons for developing the library in C++ to be "ease of parallelisation via OpenMP present in modern C++ compilers", but the Armadillo code does not use OpenMP. How can I gain the benefits of parallelisation with Armadillo? Is this achieved by using one of the high-speed LAPACK and BLAS replacements? My platform is Linux, Intel processor but I suspect there is a generic answer to this question.

解决方案

Okay so it appears that parallelisation is indeed achieved by using the high-speed LAPACK and BLAS replacements. On Ubuntu 12.04 I installed OpenBLAS using the package manager and built the Armadillo library from the source. The examples in the examples folder built and run and I can control the number of cores using the OPENBLAS_NUM_THREADS environment variable.

I created a small project openblas-benchmark which measures the performance increase of Armadillo when computing a matrix product C=AxB for various size matrices but I could only test it on a 2-core machine so far.

The performance plot shows nearly 50% reduction in execution time for matrices larger than 512x512. Note that both axes are logarithmic; each grid line on the y axis represents a doubling in execution time.

这篇关于犰狳中的并行化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆