Cholesky分解与CUDA [英] Cholesky decomposition with CUDA

查看：341 发布时间：2017/3/4 16:10:07 cuda gpu nvidia gpu-programming cusolver

本文介绍了Cholesky分解与CUDA的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图使用cuSOLVER库实现Cholesky分解。我是一个初学者CUDA程序员，我一直指定块大小和网格大小，但我不能找出这可以由程序员用cuSOLVER函数显式地设置。

I am trying to implement Cholesky decomposition using the cuSOLVER library. I am a beginner CUDA programmer and I have always specified block-sizes and grid-sizes, but I am not able to find out how this can be set explicitly by the programmer with cuSOLVER functions.

以下是文档： http://docs.nvidia.com /cuda/cusolver/index.html#introduction

QR分解是使用cuSOLVER库实现的（请参阅此处的示例： http://docs.nvidia.com/cuda/cusolver/index.html#ormqr-example1 < a>）甚至还有上面两个参数没有设置。

The QR decomposition is implemented using the cuSOLVER library (see the example here: http://docs.nvidia.com/cuda/cusolver/index.html#ormqr-example1) and even there the above two parameters are not set.

总而言之，我有以下问题：

To summarize, I have the following questions

块大小和网格大小可以用cuSOLVER库设置？

推荐答案

Robert Crovella已经回答了这个问题。这里，我只是提供了一个完整的例子，显示如何使用cuSOLVER库提供的 potrf 函数轻松执行Cholesky分解。

Robert Crovella has already answered this question. Here, I'm just providing a full example showing how Cholesky decomposition can be easily performed using the potrf function provided by the cuSOLVER library.

Utilities.cu 和 Utilities.cuh 文件保留在此页面，在此不再赘述。该示例实现了CPU以及GPU方法。

The Utilities.cu and Utilities.cuh files are mantained at this page and omitted here. The example implements the CPU as well as the GPU approach.

#include "cuda_runtime.h"
#include "device_launch_paraMeters.h"

#include<iostream>
#include<iomanip>
#include<stdlib.h>
#include<stdio.h>
#include<assert.h>

#include <cusolverDn.h>
#include <cublas_v2.h>
#include <cuda_runtime_api.h>

#include "Utilities.cuh"

/********/
/* MAIN */
/********/
int main(){

    const int Nrows = 5;
    const int Ncols = 5;

    // --- Setting the host, Nrows x Ncols matrix
    double h_A[Nrows][Ncols] = { 
        { 1.,    -1.,    -1.,    -1.,    -1.,},  
        {-1.,     2.,     0.,     0.,     0.,}, 
        {-1.,     0.,     3.,     1.,     1.,}, 
        {-1.,     0.,     1.,     4.,     2.,}, 
        {-1.,     0.,     1.,     2.,     5.,}
    };

    printf("Original matrix\n");
    for(int i = 0; i < Nrows; i++)
        for(int j = 0; j < Ncols; j++)
            printf("L[%i, %i] = %f\n", i, j, h_A[i][j]);

    // --- Setting the device matrix and moving the host matrix to the device
    double *d_A;            gpuErrchk(cudaMalloc(&d_A,      Nrows * Ncols * sizeof(double)));
    gpuErrchk(cudaMemcpy(d_A, h_A, Nrows * Ncols * sizeof(double), cudaMemcpyHostToDevice));

    // --- cuSOLVE input/output parameters/arrays
    int work_size = 0;
    int *devInfo;           gpuErrchk(cudaMalloc(&devInfo,          sizeof(int)));

    // --- CUDA solver initialization
    cusolverDnHandle_t solver_handle;
    cusolverDnCreate(&solver_handle);

    // --- CUDA CHOLESKY initialization
    cusolveSafeCall(cusolverDnDpotrf_bufferSize(solver_handle, CUBLAS_FILL_MODE_LOWER, Nrows, d_A, Nrows, &work_size));

    // --- CUDA POTRF execution
    double *work;   gpuErrchk(cudaMalloc(&work, work_size * sizeof(double)));
    cusolveSafeCall(cusolverDnDpotrf(solver_handle, CUBLAS_FILL_MODE_LOWER, Nrows, d_A, Nrows, work, work_size, devInfo));
    int devInfo_h = 0;  gpuErrchk(cudaMemcpy(&devInfo_h, devInfo, sizeof(int), cudaMemcpyDeviceToHost));
    if (devInfo_h != 0) std::cout   << "Unsuccessful potrf execution\n\n";

    // --- At this point, the upper triangular part of A contains the elements of L. Showing this.
    printf("\nFactorized matrix\n");
    gpuErrchk(cudaMemcpy(h_A, d_A, Nrows * Ncols * sizeof(double), cudaMemcpyDeviceToHost));
    for(int i = 0; i < Nrows; i++)
        for(int j = 0; j < Ncols; j++)
            if (i <= j) printf("L[%i, %i] = %f\n", i, j, h_A[i][j]);

    cusolverDnDestroy(solver_handle);

    return 0;

}

这篇关于Cholesky分解与CUDA的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Cholesky分解与CUDA [英] Cholesky decomposition with CUDA

问题描述

推荐答案

相关文章

其它硬件开发最新文章

热门教程

热门工具

登录关闭

Cholesky分解与CUDA [英] Cholesky decomposition with CUDA

问题描述

推荐答案

相关文章

其它硬件开发最新文章

热门教程

热门工具

登录 关闭

登录关闭