内核中的新运算符..奇怪的行为 [英] new operator in kernel .. strange behaviour

查看:150
本文介绍了内核中的新运算符..奇怪的行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道是否有人可以在内核中使用new运算符来阐明这种行为。以下是代码

I was wondering if anybody can shed some light on this behaviour with the new operator within a kernel.. Following is the code

#include <stdio.h>
#include "cuda_runtime.h"
#include "cuComplex.h"
using namespace std;
__global__ void test()
{

    cuComplex *store;
    store= new cuComplex[30000];
    if (store==NULL) printf("Unable to allocate %i\n",blockIdx.y);
    delete store;
    if (threadIdx.x==10000) store->x=0.0;
}

int main(int argc, char *argv[])
{
    float timestamp;
    cudaEvent_t event_start,event_stop;
    // Initialise


    cudaEventCreate(&event_start);
    cudaEventCreate(&event_stop);
    cudaEventRecord(event_start, 0);
    dim3 threadsPerBlock;
    dim3 blocks;
    threadsPerBlock.x=1;
    threadsPerBlock.y=1;
    threadsPerBlock.z=1;
    blocks.x=1;
    blocks.y=500;
    blocks.z=1;

    cudaEventRecord(event_start);
    test<<<blocks,threadsPerBlock,0>>>();
    cudaEventRecord(event_stop, 0);
    cudaEventSynchronize(event_stop);
    cudaEventElapsedTime(&timestamp, event_start, event_stop);
    printf("test took  %fms \n", timestamp);
}

在GTX680 Cuda 5上运行此程序并调查输出结果会发现随机未分配内存:(我当时在想,可能是因为所有全局内存都已完成,但是我有2GB的内存,并且由于活动块的最大数量为16,因此使用此方法分配的内存数量最大应为16 * 30000 * 8 = 38.4x10e6 ..即38Mb。那么我还应该考虑什么?

Running this on a GTX680 Cuda 5 and investigating the output one will notice that randomly memory is not allocated :( I was thinking that maybe it is because all global memory is finished but I have 2GB of memory and since the maximum amount of active blocks is 16 the amount of memory allocated with this method should at maximum be 16*30000*8=38.4x10e6.. ie around 38Mb. So what else should I consider?

推荐答案

问题与尺寸有关malloc()和free()设备系统调用使用的堆的大小,请参见第3.2.9节调用堆栈 附录B.16.1堆内存有关更多详细信息,请参见《 NVIDIA CUDA C编程指南》中的分配

The problem is related with the size of the heap used by the malloc() and free() device system calls. See section 3.2.9 Call Stack and appendix B.16.1 Heap Memory Allocation in the NVIDIA CUDA C Programming Guide for more details.

如果您将堆大小设置为适合内核要求,您的测试将正常工作

Your test will work if you set the heap size to fit your kernel requirement

    cudaDeviceSetLimit(cudaLimitMallocHeapSize, 500*30000*sizeof(cuComplex));

这篇关于内核中的新运算符..奇怪的行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆