Cholesky分解与CUDA [英] Cholesky decomposition with CUDA
问题描述
我试图使用cuSOLVER库实现Cholesky分解。我是一个初学者CUDA程序员,我一直指定块大小和网格大小,但我不能找出这可以由程序员用cuSOLVER函数显式地设置。
I am trying to implement Cholesky decomposition using the cuSOLVER library. I am a beginner CUDA programmer and I have always specified block-sizes and grid-sizes, but I am not able to find out how this can be set explicitly by the programmer with cuSOLVER functions.
以下是文档: http://docs.nvidia.com /cuda/cusolver/index.html#introduction
QR分解是使用cuSOLVER库实现的(请参阅此处的示例: http://docs.nvidia.com/cuda/cusolver/index.html#ormqr-example1 < a>)甚至还有上面两个参数没有设置。
The QR decomposition is implemented using the cuSOLVER library (see the example here: http://docs.nvidia.com/cuda/cusolver/index.html#ormqr-example1) and even there the above two parameters are not set.
总而言之,我有以下问题:
To summarize, I have the following questions
- 块大小和网格大小可以用cuSOLVER库设置?
推荐答案
Robert Crovella已经回答了这个问题。这里,我只是提供了一个完整的例子,显示如何使用cuSOLVER库提供的 potrf
函数轻松执行Cholesky分解。
Robert Crovella has already answered this question. Here, I'm just providing a full example showing how Cholesky decomposition can be easily performed using the potrf
function provided by the cuSOLVER library.
Utilities.cu
和 Utilities.cuh
文件保留在此页面,在此不再赘述。该示例实现了CPU以及GPU方法。
The Utilities.cu
and Utilities.cuh
files are mantained at this page and omitted here. The example implements the CPU as well as the GPU approach.
#include "cuda_runtime.h"
#include "device_launch_paraMeters.h"
#include<iostream>
#include<iomanip>
#include<stdlib.h>
#include<stdio.h>
#include<assert.h>
#include <cusolverDn.h>
#include <cublas_v2.h>
#include <cuda_runtime_api.h>
#include "Utilities.cuh"
/********/
/* MAIN */
/********/
int main(){
const int Nrows = 5;
const int Ncols = 5;
// --- Setting the host, Nrows x Ncols matrix
double h_A[Nrows][Ncols] = {
{ 1., -1., -1., -1., -1.,},
{-1., 2., 0., 0., 0.,},
{-1., 0., 3., 1., 1.,},
{-1., 0., 1., 4., 2.,},
{-1., 0., 1., 2., 5.,}
};
printf("Original matrix\n");
for(int i = 0; i < Nrows; i++)
for(int j = 0; j < Ncols; j++)
printf("L[%i, %i] = %f\n", i, j, h_A[i][j]);
// --- Setting the device matrix and moving the host matrix to the device
double *d_A; gpuErrchk(cudaMalloc(&d_A, Nrows * Ncols * sizeof(double)));
gpuErrchk(cudaMemcpy(d_A, h_A, Nrows * Ncols * sizeof(double), cudaMemcpyHostToDevice));
// --- cuSOLVE input/output parameters/arrays
int work_size = 0;
int *devInfo; gpuErrchk(cudaMalloc(&devInfo, sizeof(int)));
// --- CUDA solver initialization
cusolverDnHandle_t solver_handle;
cusolverDnCreate(&solver_handle);
// --- CUDA CHOLESKY initialization
cusolveSafeCall(cusolverDnDpotrf_bufferSize(solver_handle, CUBLAS_FILL_MODE_LOWER, Nrows, d_A, Nrows, &work_size));
// --- CUDA POTRF execution
double *work; gpuErrchk(cudaMalloc(&work, work_size * sizeof(double)));
cusolveSafeCall(cusolverDnDpotrf(solver_handle, CUBLAS_FILL_MODE_LOWER, Nrows, d_A, Nrows, work, work_size, devInfo));
int devInfo_h = 0; gpuErrchk(cudaMemcpy(&devInfo_h, devInfo, sizeof(int), cudaMemcpyDeviceToHost));
if (devInfo_h != 0) std::cout << "Unsuccessful potrf execution\n\n";
// --- At this point, the upper triangular part of A contains the elements of L. Showing this.
printf("\nFactorized matrix\n");
gpuErrchk(cudaMemcpy(h_A, d_A, Nrows * Ncols * sizeof(double), cudaMemcpyDeviceToHost));
for(int i = 0; i < Nrows; i++)
for(int j = 0; j < Ncols; j++)
if (i <= j) printf("L[%i, %i] = %f\n", i, j, h_A[i][j]);
cusolverDnDestroy(solver_handle);
return 0;
}
这篇关于Cholesky分解与CUDA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!