基本CUDA C程序在某些条件下崩溃 [英] Basic CUDA C Program Crashing Under Certain Conditions

查看:275
本文介绍了基本CUDA C程序在某些条件下崩溃的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个基本的CUDA程序,以更好地理解语言。我写了一些非常基本的东西,只是并行添加两个向量,并将结果打印到ppm文件。现在,向量中的值是不相关的,因为我计划调整后来产生一些类型的有趣的图像。问题是图像的分辨率(这实际上是结果向量)导致程序几乎立即崩溃,如果我使它太大。现在考虑该程序:

I am writing a basic CUDA program to get a better understanding of the language. I have written something very basic that just adds two vectors in parallel, and prints the results to a ppm file. Right now, the values within the vector are irrelevant, as I plan on adjusting that later to produce some type of interesting image. The issue is the resolution of the image (which is actually the result vector) causes the program to crash almost instantly if I make it too large. Consider the program as it is now:

#include <stdio.h>

#define cols 500
#define rows 50
#define arraySize rows * cols

__global__ void addOnGPU(int *a, int *b, int *c) {
    // Only use data at this index
    int tid = threadIdx.x + blockIdx.x * blockDim.x;

    if (tid < arraySize) c[tid] = a[tid] + b[tid];
}

int main()
{
    FILE *ppm_fp;
    int a[arraySize], b[arraySize], c[arraySize];
    int *dev_a, *dev_b, *dev_c;
    int i, j;
    int threadsperblock = 256;
    int blocks = (arraySize + threadsperblock - 1) / threadsperblock;

    printf("1\n");
    // Allocate memory on GPU for the three vectors
    cudaError_t cudaStatus = cudaMalloc((void **) &dev_a, arraySize * sizeof(int));
    cudaStatus = cudaMalloc((void **) &dev_b, arraySize * sizeof(int));
    cudaStatus = cudaMalloc((void **) &dev_c, arraySize * sizeof(int));
    if (cudaStatus != cudaSuccess) {
        fprintf(stderr, "Unable to allocate memory on the GPU!");
        return 1;
    }

    printf("2\n");
    // Assign values to input vectors
    for (i = 0, j = 0; i < arraySize; i++, j++) {
        a[i] = i;
        b[i] = i * i;
    }

    printf("3\n");
    // Copy input values to allocated vectors in GPU memory
    cudaStatus = cudaMemcpy(dev_a, a, arraySize * sizeof(int), cudaMemcpyHostToDevice);
    cudaStatus = cudaMemcpy(dev_b, b, arraySize * sizeof(int), cudaMemcpyHostToDevice);
    if (cudaStatus != cudaSuccess) {
        fprintf(stderr, "Unable to copy input vectors to the GPU!");
        return 1;
    }

    printf("before\n");
    // Add vectors in parallel and save results in dev_c
    addOnGPU<<<blocks, threadsperblock>>>(dev_a, dev_b, dev_c);
    printf("after\n");

    // Copy results from dev_c to local c vector
    cudaStatus = cudaMemcpy(c, dev_c, arraySize * sizeof(int), cudaMemcpyDeviceToHost);
    if (cudaStatus != cudaSuccess) {
        fprintf(stderr, "Unable to copy input vectors to the GPU!");
        return 1;
    }

    ppm_fp = fopen("image.ppm", "wb");
    fprintf(ppm_fp, "P6\n%d %d\n255\n", cols, rows);
    for (i = 0; i < arraySize; i++) {
        if (i % (3 * cols) == 0) fprintf(ppm_fp, "\n");
        fprintf(ppm_fp, "%d ", c[i]);
    }

    // Display contents of output vector
    for (i = 0; i < arraySize; i++) {
        printf("%d + %d = %d\n", a[i], b[i], c[i]);
    }
    printf("\n");

    // cudaDeviceReset must be called before exiting in order for profiling and
    // tracing tools such as Nsight and Visual Profiler to show complete traces.
    cudaStatus = cudaDeviceReset();
    if (cudaStatus != cudaSuccess) {
        fprintf(stderr, "cudaDeviceReset failed!");
        return 1;
    }

    return 0;
}

因为它是,程序运行良好的cols和行的值。如果我将行增加到500,那么程序崩溃。我已经包括一些调试打印语句,试图找到它崩溃的地方,但一旦我运行它崩溃。我在Visual Studio 2013上运行它(其中我是一个新手使用,更熟悉VI,Linux和手动编译)。我有一个GTX 580 3GB版本,如果这很重要。我知道没有办法超过任何内存限制,我不超过可以创建的块的65536(或者是65535)限制,或者每个块的512个线程限制。

As it stands, the program runs fine with those values of cols and rows. If I increase rows to 500, then the program crashes. I have included a few debug print statements in an attempt to find where it crashes, but as soon as I run it it crashes. I am running it on Visual Studio 2013 (of which I am a novice using, and much more familiar with VI, linux, and manual compiling). I have a GTX 580 3GB version, if that matters. I know there is no way I am going over any memory limits, and I am not exceeding the 65536 (or is it 65535) limit of blocks that can created, or the 512 limit of threads per block. Any ideas on what is going wrong?

谢谢

推荐答案

您观察到的崩溃与CUDA无关,并且是由于C / C ++静态数组分配达到的内存限制

The crash you are observing is not related to CUDA and is due to reached memory limits by the C/C++ static array allocations

int a[arraySize], b[arraySize], c[arraySize];

静态分配的数组被放入具有一般大小限制的内存堆栈。通过语法动态分配的数组

Statically allocated arrays are put into the memory stack which has in general size restrictions. Arrays dynamically allocated by the syntax

int* a = (int*)malloc(arraySize*sizeof(int));

放入内存堆中,通常在程序执行期间可能会增加,因为需要更多的内存。相反,堆内存比堆栈内存慢,因为管理动态内存分配的开销。

are put into the memory heap which in general can grow during program execution as more memory is required. Opposite to that, heap memory is slower than stack memory due to the overhead of managing dynamic memory allocations.

你可以在web中找到很多有用的材料,解释堆栈之间的区别和堆内存,请参阅例如

You can find much useful material in the web explaining the differences between stack and heap memories, see for example

内存:堆栈vs堆

和StackOverflow保护的问题

and the StackOverflow protected question

堆栈和堆栈在哪里?

让我说,在post的意义上做一个适当的CUDA错误检查总是好的。

As a closing remark, let me say that it is always good to do a proper CUDA error check in the sense of the post

What is the canonical way to check for errors using the CUDA runtime API?

这也在 CUDA标记维基。这可能会帮助你自己排除CUDA错误。

This is now mentioned also in the CUDA Tag Wiki. It would have probably helped you ruling out CUDA errors by yourself.

这篇关于基本CUDA C程序在某些条件下崩溃的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆