CUDA在__device__函数中分配内存 [英] CUDA allocate memory in __device__ function

查看:2157
本文介绍了CUDA在__device__函数中分配内存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有办法在CUDA中分配内存在__device__函数?
我找不到任何这样做的例子。



从手动:
B.15动态全局内存分配
void * malloc (size_t size);
void free(void * ptr);
从全局内存中的固定大小的堆动态分配和释放内存。
CUDA内核中的malloc()函数从设备堆分配至少大小的字节,并返回指向分配的内存的指针,如果没有足够的内存来满足请求,则返回NULL。返回的指针保证对齐到16字节边界。
CUDA in-kernel free()函数释放ptr指向的内存,这个内存必须是之前对malloc()的调用返回的。如果ptr为NULL,则忽略对free()的调用。对具有相同ptr的free()重复调用具有未定义的行为。
由给定的CUDA线程通过malloc()分配的内存在CUDA上下文的生存期内保持分配,或者直到它通过调用free()显式地释放。它可以被任何其他CUDA线程使用,甚至从随后的内核启动。任何CUDA线程可以释放由另一个线程分配的内存,但是应注意确保相同的指针不会被释放多次。

解决方案

根据 http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/docs/CUDA_C_Programming_Guide.pdf 您应该能够使用malloc ()和free()。



B.15动态全局内存分配
void * malloc(size_t size) ;
void free(void * ptr);
从全局内存中的固定大小的堆中动态分配和释放内存。



手册中给出的示例

  __ global__ void mallocTest()
{
char * ptr =(char *)malloc
printf(Thread%d got pointer:%p\\\
,threadIdx.x,ptr);
free(ptr);
}

void main()
{
//设置128 MB的堆大小。注意,这必须
//在任何内核启动之前完成。
cudaThreadSetLimit(cudaLimitMallocHeapSize,128 * 1024 * 1024);
mallocTest<<< 1,5>>();
cudaThreadSynchronize();
}

您需要编译器参数-arch = sm_20和支持> 2x建筑。


Is there a way in CUDA to allocate memory in __device__ function ? I could not find any examples of doing this.

From manual: B.15 Dynamic Global Memory Allocation void* malloc(size_t size); void free(void* ptr); allocate and free memory dynamically from a fixed-size heap in global memory. The CUDA in-kernel malloc() function allocates at least size bytes from the device heap and returns a pointer to the allocated memory or NULL if insufficient memory exists to fulfill the request. The returned pointer is guaranteed to be aligned to a 16-byte boundary. The CUDA in-kernel free() function deallocates the memory pointed to by ptr, which must have been returned by a previous call to malloc(). If ptr is NULL, the call to free() is ignored. Repeated calls to free() with the same ptr has undefined behavior. The memory allocated by a given CUDA thread via malloc() remains allocated for the lifetime of the CUDA context, or until it is explicitly released by a call to free(). It can be used by any other CUDA threads even from subsequent kernel launches. Any CUDA thread may free memory allocated by another thread, but care should be taken to ensure that the same pointer is not freed more than once.

解决方案

According to http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/docs/CUDA_C_Programming_Guide.pdf you should be able to use malloc() and free() in a device function.

Page 122

B.15 Dynamic Global Memory Allocation void* malloc(size_t size); void free(void* ptr); allocate and free memory dynamically from a fixed-size heap in global memory.

The example given in the manual.

__global__ void mallocTest()
{
    char* ptr = (char*)malloc(123);
    printf("Thread %d got pointer: %p\n", threadIdx.x, ptr);
    free(ptr);
}

void main()
{
    // Set a heap size of 128 megabytes. Note that this must
    // be done before any kernel is launched.
    cudaThreadSetLimit(cudaLimitMallocHeapSize, 128*1024*1024);
    mallocTest<<<1, 5>>>();
    cudaThreadSynchronize();
}

You need the compiler paramter -arch=sm_20 and a card that supports >2x architecture.

这篇关于CUDA在__device__函数中分配内存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆