大矩阵反演 [英] Large Matrix Inversion

查看:254
本文介绍了大矩阵反演的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一个大矩阵的倒数,常用大小为1000 x 1000,但有时会超过100000 x 100000(由于时间和内存的原因,目前出现故障)。我知道正常的情绪是不采取相反,找到一些其他方式来做,但这是不可能的,在那一刻。其原因是由于使用了已经预期获得矩阵逆的软件。 (注意:我正在寻找改变这种方法,但是这将需要很长时间)



现在我们使用数字重组的LU分解方法,我目前正在测试特征库。特征库似乎更稳定,有点快,但我仍然在测试阶段的准确性。我已经快速看看其他库,如ATLAS和LAPACK,但还没有做任何实质性测试与这些。



看起来好像本征库不使用并发方法来计算逆(尽管对于逆向的LU因式分解部分)并且我可以告诉ATLAS和LAPACK在这个限制中是类似的。第一个问题是,任何人都可以解释如何通过并行化来优化矩阵求逆。(我目前正在用openMP测试速度差异。)



我在此处找到了一篇介绍矩阵求逆并行算法的文章,但我不明白。看来这篇文章谈论另一种方法?我也不确定scaLAPACK或PETSc是否有用?



第二个问题,我读了这篇文章使用GPU来提高性能,但我从来没有编码的GPU,所以不知道什么是试图传达,但底部的图表看起来相当令人震惊。这是怎么可能的,如果它是真的,我如何开始去实现这样的东西。



我还发现这篇文章,还没有时间通读它来理解,但似乎有前途,因为记忆是我们软件的当前问题。



有关这些文章或一般问题的任何信息将是非常有帮助的。

解决方案

第一个问题是,如果这个问题看起来很模糊,任何人都可以解释如何通过并行化来优化矩阵求逆。



我会猜测这个和线性代数中的相关主题,是并行计算中研究最多的主题之一。如果你坚持寻找某个地方开始阅读,那么好的老 Golub和Van Loan 有一个主题的章节。至于Scalapack和Petsc是否可能有用,肯定是前者,可能是后者。当然,他们都依赖于MPI,但是在这个领域被认为是理所当然的。



第二个问题...

使用GPU,如果你有他们,你可以负担得起翻译您的代码到您的GPU支持的编程模型。如果你从来没有为GPU编码并且可以访问一组商品类型的CPU,你将通过使用这个集群比通过崭新的技术更快地获得速度。


$ b $对于你提到的最后一篇文章,现在已经10年了,在一个变化非常快的领域(尝试找一个10年的研究论文关于使用GPU的矩阵求逆)。我不能评论它的卓越或其他属性,但你提到的问题大小似乎在我现在的集群的能力内核(使用旧术语)计算。如果你的矩阵非常大,它们是否也稀疏?



最后,我强烈支持你的明显意图使用现有的现成代码,而不是试图开发您自己的。


I am looking at taking the inverse of a large matrix, common size of 1000 x 1000, but sometimes exceeds 100000 x 100000 (which is currently failing due to time and memory). I know that the normal sentiment is 'don't take the inverse, find some other way to do it', but that is not possible at the moment. The reason for this is due to the usage of software that is already made that expects to get the matrix inverse. (Note: I am looking into ways of changing this, but that will take a long time)

At the moment we are using an LU decomposition method from numerical recopies, and I am currently in the process of testing the eigen library. The eigen library seems to be more stable and a bit faster, but I am still in testing phase for accuracy. I have taken a quick look at other libraries such as ATLAS and LAPACK but have not done any substantial testing with these yet.

It seems as though the eigen library does not use concurrent methods to compute the inverse (though does for LU factorization part of the inverse) and as far as I can tell ATLAS and LAPACK are similar in this limitation. (I am currently testing the speed difference for eigen with openMP and without.)

First question is can anyone explain how it would be possible to optimize matrix inversion by parallelization. I found an article here that talks about matrix inversion parallel algorithms, but I did not understand. It seems this article talks about another method? I am also not sure if scaLAPACK or PETSc are useful?

Second question, I read this article of using the GPUs to increase performance, but I have never coded for GPUs and so have no idea what is trying to convey, but the charts at the bottom looked rather alarming. How is this even possible, and how where do I start to go about implementing something like this if it is to be true.

I also found this article, have yet had the time to read through it to understand, but it seems promising, as memory is a current issue with our software.

Any information about these articles or the problems in general would be of great help. And again I apologize if this question seems vague, I will try to expand more if necessary.

解决方案

First question is can anyone explain how it would be possible to optimize matrix inversion by parallelization.

I'd hazard a guess that this, and related topics in linear algebra, is one of the most studied topics in parallel computing. If you're stuck looking for somewhere to start reading, well good old Golub and Van Loan have a chapter on the topic. As to whether Scalapack and Petsc are likely to be useful, certainly the former, probably the latter. Of course, they both depend on MPI but that's kind of taken for granted in this field.

Second question ...

Use GPUs if you've got them and you can afford to translate your code into the programming model supported by your GPUs. If you've never coded for GPUs and have access to a cluster of commodity-type CPUs you'll get up to speed quicker by using the cluster than by wrestling with a novel technology.

As for the last article you refer to, it's now 10 years old in a field that changes very quickly (try finding a 10-year old research paper on using GPUs for matrix inversion). I can't comment on its excellence or other attributes, but the problem sizes you mention seem to me to be well within the capabilities of modern clusters for in-core (to use an old term) computation. If your matrices are very big, are they also sparse ?

Finally, I strongly support your apparent intention to use existing off-the-shelf codes rather than to try to develop your own.

这篇关于大矩阵反演的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆