在cudaDeviceSynchronize上的非法内存访问 [英] Illegal Memory Access on cudaDeviceSynchronize

查看：251 发布时间：2020/7/20 20:10:18 cuda cuda-gdb

本文介绍了在cudaDeviceSynchronize上的非法内存访问的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我遇到一个非常奇怪的错误，因为在运行特定大小的Heat 2D模拟时出现非法内存访问"错误，但是如果我运行完全相同的模拟，并且元素较少，则模拟运行良好.

I am encountering a very strange bug in that I get an 'illegal memory access' error when running a Heat 2D simulation of a particular size, but the simulation runs well if I run the exact same simulation, just with fewer elements.

是否有理由增加数组大小会导致此异常?我使用的是Titan Black GPU(6 GB内存)，但是我正在运行的仿真距离该大小还差得远.我计算可以进行4000 x 4000的模拟，但如果超过250 x 250，则会出现错误.

Is there a reason that increasing the size of an array would cause this exception? I am using a Titan Black GPU (6 GB of memory), but the simulation I am running is nowhere near that size. I calculated that I could run a 4000 x 4000 simulation, but I get errors if I exceed 250 x 250.

实例化设备上的模拟对象数组后，立即发生错误.实例化代码如下:

The error occurs immediately after I instantiate the array of simulation objects on the device. Instantiation code is as follows:

template<typename PlaceType, typename StateType>
__global__ void instantiatePlacesKernel(Place** places, StateType *state,
        void *arg, int *dims, int nDims, int qty) {
    unsigned idx = blockDim.x * blockIdx.x + threadIdx.x;

    if (idx < qty) {
        // set pointer to corresponding state object
        places[idx] = new PlaceType(&(state[idx]), arg);
        places[idx]->setIndex(idx);
        places[idx]->setSize(dims, nDims);
    }
}

template<typename PlaceType, typename StateType>
Place** DeviceConfig::instantiatePlaces(int handle, void *argument, int argSize,
        int dimensions, int size[], int qty) {

    // add global constants to the GPU
    memcpy(glob.globalDims,size, sizeof(int) * dimensions);
    updateConstants(glob);

    // create places tracking
    PlaceArray p; // a struct to track qty, 
    p.qty = qty;

    // create state array on device
    StateType* d_state = NULL;
    int Sbytes = sizeof(StateType);
    CATCH(cudaMalloc((void** ) &d_state, qty * Sbytes));
    p.devState = d_state; // save device pointer

    // allocate device pointers
    Place** tmpPlaces = NULL;
    int ptrbytes = sizeof(Place*);
    CATCH(cudaMalloc((void** ) &tmpPlaces, qty * ptrbytes));
    p.devPtr = tmpPlaces; // save device pointer

    // handle arg if necessary
    void *d_arg = NULL;
    if (NULL != argument) {
        CATCH(cudaMalloc((void** ) &d_arg, argSize));
        CATCH(cudaMemcpy(d_arg, argument, argSize, H2D));
    }

    // load places dimensions
    int *d_dims;
    int dimBytes = sizeof(int) * dimensions;
    CATCH(cudaMalloc((void** ) &d_dims, dimBytes));
    CATCH(cudaMemcpy(d_dims, size, dimBytes, H2D));

    // launch instantiation kernel
    int blockDim = (qty - 1) / BLOCK_SIZE + 1;
    int threadDim = (qty - 1) / blockDim + 1;
    Logger::debug("Launching instantiation kernel");
    instantiatePlacesKernel<PlaceType, StateType> <<<blockDim, threadDim>>>(tmpPlaces, d_state,
            d_arg, d_dims, dimensions, qty);
    CHECK();

    CATCH(cudaDeviceSynchronize()); // ERROR OCCURS HERE

    // clean up memory
    if (NULL != argument) {
        CATCH(cudaFree(d_arg));
    }
    CATCH(cudaFree(d_dims));
    CATCH(cudaMemGetInfo(&freeMem, &allMem));

    return p.devPtr;
}

请假定您看到的所有自定义类型都在工作，因为此代码在足够小的仿真下将正确执行.我感到沮丧的是，当大小超过250 x 250个元素时，内核函数的位置和状态数组中的元素数量似乎会导致错误.任何见解都会很棒.

Please assume any custom types you see are working, as this code executes without error on a sufficiently small simulation. I am frustrated that it appears that the number of elements in the kernel function's places and state arrays causes an error when the size exceeds 250 x 250 elements. Any insight would be awesome.

谢谢！

在cudaDeviceSynchronize上的非法内存访问 [英] Illegal Memory Access on cudaDeviceSynchronize

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在cudaDeviceSynchronize上的非法内存访问 [英] Illegal Memory Access on cudaDeviceSynchronize

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭