在OpenCL中并行执行许多小型矩阵运算 [英] Performing many small matrix operations in parallel in OpenCL

查看:207
本文介绍了在OpenCL中并行执行许多小型矩阵运算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个问题,要求我对许多(〜4k)小(〜3x3)方形Hermitian矩阵进行特征分解和矩阵乘法.特别是,我需要每个工作项执行一个这样的矩阵的特征分解,然后执行两个矩阵乘法.因此,每个线程要做的工作很少,并且整个工作应该高度并行化.

I have a problem that requires me to do eigendecomposition and matrix multiplication of many (~4k) small (~3x3) square Hermitian matrices. In particular, I need each work item to perform eigendecomposition of one such matrix, and then perform two matrix multiplications. Thus, the work that each thread has to do is rather minimal, and the full job should be highly parallelizable.

不幸的是,似乎所有可用的OpenCL LAPACK都是用于将大型矩阵上的操作委派给GPU的,而不是用于在OpenCL内核中进行较小的线性代数运算.因为我宁愿不实现矩阵乘法和 eigendepositionposition 我自己对于OpenCL中任意大小的矩阵,我希望这里的人可能知道适合该工作的库?

Unfortunately, it seems all the available OpenCL LAPACKs are for delegating operations on large matrices to the GPU rather than for doing smaller linear algebra operations inside an OpenCL kernel. As I'd rather not implement matrix multiplcation and eigendecomposition for arbitrarily sized matrices in OpenCL myself, I was hoping someone here might know of a suitable library for the job?

由于矩阵类型此处,有一个类似的问题,但它只是说要自己提出来,所以我我希望从那以后情况有所改善.

I'm aware that OpenCL might be getting built-in matrix operations at some point since the matrix type is reserved, but that is not really of much use right now. There is a similar question here from 2011, but it pretty much just says to roll your own, so I'm hoping the situation has improved since then.

推荐答案

通常,我对LAPACK,fftw,cuFFT等库的经验是,当您要处理许多非常小的问题时,您会更好为表现而写自己的东西.这些库通常是出于通用性而编写的,因此您经常可以在特定的小问题上胜过它们的性能,尤其是如果您可以使用特定问题的独特属性时.

In general, my experience with libraries like LAPACK, fftw, cuFFT, etc. is that when you want to do many really small problems like this, you are better off writing your own for performance. Those libraries are usually written for generality, so you can often beat their performance for specific small problems, especially if you can use unique properties of your particular problem.

我知道您不想听到自己动手"的声音,但是对于此类问题,这确实是执行IMO的最佳选择.您可能会找到一个库来执行此操作,但是考虑到您真正想要的(用于性能)代码将不能一概而论,我怀疑它是否存在.您将专门寻找代码来查找3x3矩阵的特征值.这不是库,而是带有适当许可证的随机代码片段,您可以对其进行操作以利用自己的特定问题.

I realize you don't want to hear "roll your own" but for this type of problem it is really the best thing to do IMO. You might find a library to do this, but considering the code that you really want (for performance) will not generalize, I doubt it exists. You'll be looking specifically for code to find the eigenvalues of 3x3 matrices. That's less of a library and more of a random code snippet with a suitable license that you can manipulate to take advantage of your specific problem.

在这种特定情况下,您可以使用特征多项式通过教科书方法找到3x3矩阵的特征值.请记住,对于三次方程式,有一个相对简单的封闭式解决方案: http://en.wikipedia.org/wiki/Cubic_function# General_formula_for_roots .

In this specific case, you can find the eigenvalues of a 3x3 matrix with the textbook method using the characteristic polynomial. Remember that there is a relatively simple closed form solution for cubic equations: http://en.wikipedia.org/wiki/Cubic_function#General_formula_for_roots.

虽然我认为这种方法很有可能比迭代方法快得多,但最好还是验证一下性能是否成问题.

While I think it is very likely that this approach would be much faster than iterative methods, it would be wise to verify that if performance is an issue.

这篇关于在OpenCL中并行执行许多小型矩阵运算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆