关于大型稀疏矩阵求逆的建议 [英] Advice about inversion of large sparse matrices

查看:328
本文介绍了关于大型稀疏矩阵求逆的建议的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

只需安装一个Windows机器,即可安装两个64位Intel Xeon X5680 3.33 GHz处理器(每个6核)和12 GB的RAM.我一直在一些大型数据集上使用SAS,但是它太慢了,因此我想设置R来进行并行处理.我希望能够执行矩阵运算,例如乘法和求逆.我的大部分数据都不是很大,范围为3-4 GB,但是一个文件大约为50 GB.自从我使用R以来已经有一段时间了,所以我在网上浏览了包括CRAN HPC在内的所有内容.我认为一个foreach循环和bigmemory包将适用.我遇到了这篇文章:是否有用于并行的软件包R 中的矩阵求逆有一些有趣的建议.我想知道是否有人对HIPLAR软件包有经验.看起来hipmarlm将功能添加到了矩阵包中,而hiplarb则总共添加了新功能.建议为我的应用选择哪一个?此外,还有对PLASMA库的引用.这有什么帮助吗?我的矩阵有很多零,所以我认为它们可以被认为是稀疏的.我没有看到任何有关如何将数据从R传递到PLASMA的示例,并且查看PLASMA文档,它说它不支持稀疏矩阵,因此我认为我不需要此库.我在正确的轨道上吗?还有其他建议吗?

Just got a Windows box set up with two 64 bit Intel Xeon X5680 3.33 GHz processors (6 cores each) and 12 GB of RAM. I've been using SAS on some large data sets, but it's just too slow, so I want to set up R to do parallel processing. I want to be able to carry out matrix operations, e.g., multiplication and inversion. Most of my data are not huge, 3-4 GB range, but one file is around 50 GB. It's been a while since I used R, so I looked around on the web, including the CRAN HPC, to see what was available. I think a foreach loop and the bigmemory package will be applicable. I came across this post: Is there a package for parallel matrix inversion in R that had some interesting suggestions. I was wondering if anyone has experience with the HIPLAR packages. Looks like hiparlm adds functionality to the matrix package and hiplarb add new functions altogether. Which of these would be recommended for my application? Furthermore, there is a reference to the PLASMA library. Is this of any help? My matrices have a lot of zeros, so I think they could be considered sparse. I didn't see any examples of how to pass data fro R to PLASMA, and looking at the PLASMA docs, it says it does not support sparse matrices, so I'm thinking that I don't need this library. Am I on the right track here? Any suggestions on other approaches?

看起来像HIPLAR,而pbdr软件包将无济于事.尽管看起来I/O可能是一个问题,但我更倾向于bigmemory: http: //files.meetup.com/1781511/bigmemoryRandLinearAlgebra_BryanLewis.pdf .本文讨论了用于虚拟关联矩阵的软件包vam,但它必须是专有的. ff软件包对您有什么帮助吗?我的R技能还不足以知道该朝哪个方向发展.可以肯定的是,我可以使用bigmemory读取此内容,但不确定处理速度会很快.

It looks like HIPLAR and package pbdr will not be helpful. I'm leaning more toward bigmemory, although it looks like I/O may be a problem: http://files.meetup.com/1781511/bigmemoryRandLinearAlgebra_BryanLewis.pdf. This article talks about a package vam for virtual associative matrices, but it must be proprietary. Would package ff be of any help here? My R skills are just not current enough to know what direction to pursue. Pretty sure I can read this using bigmemory, but not sure the processing will be very fast.

推荐答案

如果要使用HiPLAR(R中的MAGMA和PLASMA库),目前仅适用于Linux.为此,我建议将操作系统切换到企鹅.

If you want to use HiPLAR (MAGMA and PLASMA libraries in R), it is only available for Linux at the moment. For this and many other things, I suggest switching your OS to the penguin.

话虽如此,英特尔MKL优化可以为此类操作带来奇迹.对于大多数实际用途,这是必经之路.例如,使用MKL优化构建的Python可以处理比专门为图像处理设计的IDL快20倍的大型矩阵.当使用MKL优化进行构建时,R同样显示出巨大的改进.您还可以从Revolution Analytics安装R Open,其中包括MKL优化,但是我不确定它的效果与使用Intel工具自己构建它的效果相同:

That being said, Intel MKL optimization can do wonders for these sort of operations. For most practical uses, it is the way to go. Python built with MKL optimization for example can process large matrices about 20x faster than IDL, which was designed specifically for image processing. R has similarly shown vast improvements when built with MKL optimization. You can also install R Open from Revolution Analytics, which includes MKL optimization, but I am not sure that it has quite the same effect as building it yourself using Intel tools: https://software.intel.com/en-us/articles/build-r-301-with-intel-c-compiler-and-intel-mkl-on-linux

我绝对会考虑一个人想要执行的操作类型. GPU进程是那些可以很好地实现高度并行性的进程(与矩阵代数一样,许多相同的小运算一次运行),但是它们受到总线速度的限制.英特尔MKL优化相似之处在于,它可以帮助使用您的所有CPU内核,但实际上已针对英特尔CPU架构进行了优化.因此,它也应该提供基本的内存优化.我认为这是最简单的路线. HiPLAR肯定是未来,因为它在设计上就是CPU-GPU,尤其是高度并行的异构体系结构已进入消费类系统.我认为,当今大多数消费者系统无法充分利用这一点.

I would definitely consider the type of operations one is looking to perform. GPU processes are those that lend well to high parallelism (many of the same little computations running at once, as with matrix algebra), but they are limited by bus speeds. Intel MKL optimization is similar in that it can help use all of your CPU cores, but it is really optimized to Intel CPU architecture. Hence, it should provide basic memory optimization too. I think that is the simplest route. HiPLAR is certainly the future, as it is CPU-GPU by design, especially with highly parallel heterogeneous architectures making their way into consumer systems. Most consumer systems today cannot fully utilize this though I think.

干杯

亚当

这篇关于关于大型稀疏矩阵求逆的建议的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆