变化的结果从cuBlas [英] Varying results from cuBlas

查看：181 发布时间：2016/10/20 23:37:02 c++ cuda cublas

本文介绍了变化的结果从cuBlas的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我实施了以下CUDA代码，但我对该行为有点困惑。

I have implemented the following CUDA code but i am a little bit confused about the behavior.

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <cuda_runtime.h>
#include "cublas_v2.h"
#include <ctime>
#include <chrono>
#include <string>

#define IDX2F(i,j,ld) ((((j)-1)*(ld))+((i)-1)) 

void PrintMatrix(float* a, int n)
{
    int j, i;
    for (j = 1; j <= n; j++)
    {
        for (i = 1; i <= n; i++)
        {
            printf("%7.0f", a[IDX2F(i, j, n)]);
        }
        printf("\n");
    }
}

float* CreateMatrix(int n)
{
    float* matrix = static_cast<float *>(malloc(n * n * sizeof(float)));
    if (!matrix)
    {
        printf("host memory allocation failed");
        return nullptr;
    }

    for (int j = 1; j <= n; j++)
    {
        for (int i = 1; i <= n; i++)
        {
            matrix[IDX2F(i, j, n)] = 2;
        }
    }

    return matrix;
}

long CudaMatrixMultiply(float* matrix, int n)
{
    cudaError_t cudaStat;
    cublasStatus_t status;
    cublasHandle_t handle;
    float* deviceMatrix;

    cudaStat = cudaMalloc(reinterpret_cast<void**>(&deviceMatrix), n * n * sizeof(float));
    if (cudaStat != cudaSuccess)
    {
        printf("device memory allocation failed");
        return EXIT_FAILURE;
    }

    status = cublasCreate(&handle);
    if (status != CUBLAS_STATUS_SUCCESS)
    {
        printf("CUBLAS initialization failed\n");
        return EXIT_FAILURE;
    }

    status = cublasSetMatrix(n, n, sizeof(float), matrix, n, deviceMatrix, n);
    if (status != CUBLAS_STATUS_SUCCESS)
    {
        printf("data download failed");
        cudaFree(deviceMatrix);
        cublasDestroy(handle);
        return EXIT_FAILURE;
    }

    float alpha = 1;
    float beta = 0;
    cublasSgemm(handle, CUBLAS_OP_N, CUBLAS_OP_N, n, n, n, &alpha, deviceMatrix, n, deviceMatrix, n, &beta, deviceMatrix, n);

    status = cublasGetMatrix(n, n, sizeof(float), deviceMatrix, n, matrix, n);
    if (status != CUBLAS_STATUS_SUCCESS)
    {
        printf("data upload failed");
        cudaFree(deviceMatrix);
        cublasDestroy(handle);
        return EXIT_FAILURE;
    }

    cudaFree(deviceMatrix);
    cublasDestroy(handle);
    return EXIT_SUCCESS;
}

float* CpuMatrixMultiply(float* matrix, int size)
{
    float* result = new float[size * size]();

    // Copied from https://msdn.microsoft.com/en-us/library/hh873134.aspx
    for (int row = 1; row <= size; row++) 
    {
        for (int col = 1; col <= size; col++) 
        {
            // Multiply the row of A by the column of B to get the row, column of product.
            for (int inner = 1; inner <= size; inner++) 
            {
                // result[row][col] += matrix[row][inner] * matrix[inner][col];
                result[IDX2F(col, row, size)] += matrix[IDX2F(inner, row, size)] * matrix[IDX2F(col, inner, size)];
            }
        }
    }

    free(matrix);
    return result;
}

int main(void)
{
    // printf("Matrix * Matrix Test\n");
    int size = 1000;
    int runs = 10;

    for (int run = 0; run != runs; run++)
    {
        printf("=== Test %d (Matrix * Matrix, Size = %d) ===\n\n", run + 1, size);
        printf("RAM usage is: %f GB\n", size * size * sizeof(float) / 1000000000.0);

        float* cpuMatrix = CreateMatrix(size);
        cpuMatrix = CpuMatrixMultiply(cpuMatrix, size);

        PrintMatrix(cpuMatrix, 5);

        float* gpuMatrix = CreateMatrix(size);
        CudaMatrixMultiply(gpuMatrix, size);
        PrintMatrix(gpuMatrix, 5);

        free(cpuMatrix);
        free(gpuMatrix);
    }
    getchar();
    return EXIT_SUCCESS;
}

em> MatrixMultiplication 如下所示：

The ouput of the CPU version of the MatrixMultiplication is the following as expected:

4000 4000 4000 4000 4000

4000 4000 4000 4000 4000

4000 4000 4000 4000 4000

4000 4000 4000 4000 4000

4000 4000 4000 4000 4000

4000 4000 4000 4000 4000
4000 4000 4000 4000 4000
4000 4000 4000 4000 4000
4000 4000 4000 4000 4000
4000 4000 4000 4000 4000

但是GPU计算的结果有时是正确的（见上文）或错误的随机（？）。当第一次执行循环时，结果总是正确的。

but the result of the GPU computed is sometimes the right one (see above) or a wrong random(?) one. When the loop is executed the first time then the result was always the right one.

我无法在我的代码中找到错误，如果你能帮助我，这将是巨大的。

I am not able to find a mistake in my code and it would be great if you could help me.

此外，如果我设置 size （int main方法）为eg 16000 然后我的驱动程序崩溃，我得到一个错误消息。对于这个我写了一个错误报告NVidea，因为我的电脑崩溃了两次。

Additionally if i set size (int the main method) to e.g. 16000 then my driver is crashing and i get an error message. For this i have written a bug report to NVidea because my pc crashed twice. But maybe it is a programming fault by me?

驱动程序：364.72（最新版本）

SDK：CUDA工具包7.5
$
显卡：NVidia GeForce GTX 960（4GB）

Windows 10 64位

Driver: 364.72 (newest one)
SDK: CUDA Toolkit 7.5
Graphics Card: NVidia GeForce GTX 960 (4GB)
Windows 10 64Bit

驱动程序错误 p>

Driver Error

显示驱动程序NVIDIA Windows内核模式驱动程序，版本362.72已停止响应并已成功恢复。

Display driver NVIDIA Windows kernel Mode Driver, Version 362.72 stopped responding and has successfully recovered.

编辑：在社区的帮助下，我发现这是看门狗定时器的问题。

变化的结果从cuBlas [英] Varying results from cuBlas

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

变化的结果从cuBlas [英] Varying results from cuBlas

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭