Cholesky分解与CUDA [英] Cholesky decomposition with CUDA

查看:341
本文介绍了Cholesky分解与CUDA的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图使用cuSOLVER库实现Cholesky分解。我是一个初学者CUDA程序员,我一直指定块大小和网格大小,但我不能找出这可以由程序员用cuSOLVER函数显式地设置。

I am trying to implement Cholesky decomposition using the cuSOLVER library. I am a beginner CUDA programmer and I have always specified block-sizes and grid-sizes, but I am not able to find out how this can be set explicitly by the programmer with cuSOLVER functions.

以下是文档: http://docs.nvidia.com /cuda/cusolver/index.html#introduction

QR分解是使用cuSOLVER库实现的(请参阅此处的示例: http://docs.nvidia.com/cuda/cusolver/index.html#ormqr-example1 < a>)甚至还有上面两个参数没有设置。

The QR decomposition is implemented using the cuSOLVER library (see the example here: http://docs.nvidia.com/cuda/cusolver/index.html#ormqr-example1) and even there the above two parameters are not set.

总而言之,我有以下问题:

To summarize, I have the following questions


  • 块大小和网格大小可以用cuSOLVER库设置?


推荐答案

Robert Crovella已经回答了这个问题。这里,我只是提供了一个完整的例子,显示如何使用cuSOLVER库提供的 potrf 函数轻松执行Cholesky分解。

Robert Crovella has already answered this question. Here, I'm just providing a full example showing how Cholesky decomposition can be easily performed using the potrf function provided by the cuSOLVER library.

Utilities.cu Utilities.cuh 文件保留在此页面,在此不再赘述。该示例实现了CPU以及GPU方法。

The Utilities.cu and Utilities.cuh files are mantained at this page and omitted here. The example implements the CPU as well as the GPU approach.

#include "cuda_runtime.h"
#include "device_launch_paraMeters.h"

#include<iostream>
#include<iomanip>
#include<stdlib.h>
#include<stdio.h>
#include<assert.h>

#include <cusolverDn.h>
#include <cublas_v2.h>
#include <cuda_runtime_api.h>

#include "Utilities.cuh"

/********/
/* MAIN */
/********/
int main(){

    const int Nrows = 5;
    const int Ncols = 5;

    // --- Setting the host, Nrows x Ncols matrix
    double h_A[Nrows][Ncols] = { 
        { 1.,    -1.,    -1.,    -1.,    -1.,},  
        {-1.,     2.,     0.,     0.,     0.,}, 
        {-1.,     0.,     3.,     1.,     1.,}, 
        {-1.,     0.,     1.,     4.,     2.,}, 
        {-1.,     0.,     1.,     2.,     5.,}
    };

    printf("Original matrix\n");
    for(int i = 0; i < Nrows; i++)
        for(int j = 0; j < Ncols; j++)
            printf("L[%i, %i] = %f\n", i, j, h_A[i][j]);

    // --- Setting the device matrix and moving the host matrix to the device
    double *d_A;            gpuErrchk(cudaMalloc(&d_A,      Nrows * Ncols * sizeof(double)));
    gpuErrchk(cudaMemcpy(d_A, h_A, Nrows * Ncols * sizeof(double), cudaMemcpyHostToDevice));

    // --- cuSOLVE input/output parameters/arrays
    int work_size = 0;
    int *devInfo;           gpuErrchk(cudaMalloc(&devInfo,          sizeof(int)));

    // --- CUDA solver initialization
    cusolverDnHandle_t solver_handle;
    cusolverDnCreate(&solver_handle);

    // --- CUDA CHOLESKY initialization
    cusolveSafeCall(cusolverDnDpotrf_bufferSize(solver_handle, CUBLAS_FILL_MODE_LOWER, Nrows, d_A, Nrows, &work_size));

    // --- CUDA POTRF execution
    double *work;   gpuErrchk(cudaMalloc(&work, work_size * sizeof(double)));
    cusolveSafeCall(cusolverDnDpotrf(solver_handle, CUBLAS_FILL_MODE_LOWER, Nrows, d_A, Nrows, work, work_size, devInfo));
    int devInfo_h = 0;  gpuErrchk(cudaMemcpy(&devInfo_h, devInfo, sizeof(int), cudaMemcpyDeviceToHost));
    if (devInfo_h != 0) std::cout   << "Unsuccessful potrf execution\n\n";

    // --- At this point, the upper triangular part of A contains the elements of L. Showing this.
    printf("\nFactorized matrix\n");
    gpuErrchk(cudaMemcpy(h_A, d_A, Nrows * Ncols * sizeof(double), cudaMemcpyDeviceToHost));
    for(int i = 0; i < Nrows; i++)
        for(int j = 0; j < Ncols; j++)
            if (i <= j) printf("L[%i, %i] = %f\n", i, j, h_A[i][j]);

    cusolverDnDestroy(solver_handle);

    return 0;

}

这篇关于Cholesky分解与CUDA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆