仅在 GPU 上求解小型对称正定 Ax = b [英] Solve small symmetric positive definite Ax = b on GPU only

查看:12
本文介绍了仅在 GPU 上求解小型对称正定 Ax = b的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试优化实时 3D 建模中的应用程序.应用程序的计算部分几乎完全在 CUDA 的 GPU 上运行.该应用程序需要每秒 500+ 次的小型 (6x6) 双精度对称正定线性系统 Ax = b 的求解.目前,这是通过使用 Cholesky 的基于 CPU 的高效线性代数库完成的,但需要每秒数百次从 CPU - GPU 复制数据并返回到 GPU 以及每次内核启动的开销等.

I'm attempting to optimise an application in realtime 3D modelling. The compute part of the application runs almost entirely on the GPU in CUDA. The application requires the solution of a small (6x6) double precision symmetric positive definite linear system Ax = b 500+ times per second. Currently this is being done with an efficient CPU based Linear Algebra library using Cholesky but necessitates the copying of data from the CPU - GPU and back to GPU hundreds of times per second and the overhead of kernel launches each time etc.

如何仅在 GPU 上计算线性系统的解,而无需将数据带到 CPU 上?我已经阅读了一些关于 MAGMA 库的信息,但它似乎使用混合算法而不是仅使用 GPU 的算法.

How can I calculate the solution to the linear system on the GPU solely without having to take the data onto the CPU at all? I've read a little about the MAGMA library but it seems to use hybrid algorithms rather than GPU only algorithms.

我已经做好准备,GPU 上的单个线性系统的解决方案将比现有的基于 CPU 的库慢很多,但我想看看是否可以通过删除主机和设备之间的数据通信以及内核启动的开销等每秒数百次.如果那里没有 GPU 唯一的类似 LAPACK 的替代方案,我将如何实施一些东西来仅在 GPU 上解决这个特定的 6x6 案例?例如,是否可以在没有大量时间投资的情况下使用 GPU BLAS 库来完成?

I'm prepared for the fact that the solution of an individual linear system on the GPU is going to be a lot slower than with the existing CPU based library but I want to see if that can be made up for by removing the data communication between the host and device and the overhead of kernel launches etc hundreds of times per second. If there is no GPU only LAPACK-like alternative out there how would I go about implementing something to solve this particular 6x6 case on the GPU only? Could it be done without a huge time investment with GPU BLAS libraries for example?

推荐答案

去年秋天,NVIDIA 在注册开发者网站上发布了批处理 Ax=b 求解器的代码.此代码适用于通用矩阵,并且应该可以很好地满足您的需求,前提是您可以将对称矩阵扩展到完整矩阵(这对于 6x6 来说应该不是问题吗?).由于代码执行旋转,这对于正定矩阵来说是不必要的,所以它不是您的最佳选择,但您可以根据您的目的对其进行修改,因为代码是在 BSD 许可下的.

NVIDIA posted code for a batched Ax=b solver to the registered developer website last fall. This code works for generic matrices, and should work well enough for your needs provided you can expand the symmetric matrices to full matrices (that should not be an issue for a 6x6?). As the code performs pivoting, which is unnecessary for positive definite matrices, it is not optimal for your case, but you may be able to modify it for your purposes as the code is under a BSD license.

NVIDIA 的标准开发者网站目前遇到一些问题.此时您可以通过以下方式下载批处理求解器代码:

NVIDIA's standard developer website is experiencing some issues at the moment. Here is how you can download the batched solver code at this time:

(1) 转到 http://www.nvidia.com/content/cuda/cuda-toolkit.html

(2) 如果您已有 NVdeveloper 帐户(例如通过 partners.nvidia.com),请单击屏幕右半部分的绿色登录到 nvdeveloper"链接.否则点击加入nvdeveloper"申请新账号;新帐户申请通常会在一个工作日内获得批准.

(2) If you have an existing NVdeveloper account (e.g. via partners.nvidia.com) click on the green "Login to nvdeveloper" link on the right half of the screen. Otherwise click on "Join nvdeveloper" to apply for a new account; requests for new accounts are typically approved within one business day.

(3) 根据提示使用您的电子邮件地址和密码登录

(3) Log in at the prompt with your email address and password

(4) 右侧有一个标题为最新下载"的部分.从上数第五项是Batched Solver".单击它,它将带您进入代码下载页面.

(4) There is a section on the right hand side titled "Newest Downloads". The fifth item from the top is "Batched Solver". Click on that and it will bring you to the download page for the code.

(5) 点击下载"链接,然后点击接受"接受许可条款.您的下载应该开始了.

(5) Click on the "download" link, then click "Accept" to accept the license terms. Your download should start.

这篇关于仅在 GPU 上求解小型对称正定 Ax = b的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆