CUDA 在 __device__ 函数中分配内存 [英] CUDA allocate memory in __device__ function

查看:21
本文介绍了CUDA 在 __device__ 函数中分配内存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

CUDA 有没有办法在设备端函数中动态分配内存?我找不到任何这样做的例子.

Is there a way in CUDA to allocate memory dynamically in device-side functions ? I could not find any examples of doing this.

来自 CUDA C 编程手册:

From the CUDA C Programming manual:

B.15 动态全局内存分配

B.15 Dynamic Global Memory Allocation

void* malloc(size_t size); 
void free(void* ptr); 

从全局内存中的固定大小堆动态分配和释放内存.

allocate and free memory dynamically from a fixed-size heap in global memory.

内核中的 CUDA malloc() 函数从设备堆中分配至少 size 个字节,并返回一个指向已分配内存的指针,如果没有足够的内存来满足请求,则返回 NULL.返回的指针保证与 16 字节边界对齐.

The CUDA in-kernel malloc() function allocates at least size bytes from the device heap and returns a pointer to the allocated memory or NULL if insufficient memory exists to fulfill the request. The returned pointer is guaranteed to be aligned to a 16-byte boundary.

内核中的 CUDA free() 函数释放 ptr 指向的内存,该内存必须由先前调用 malloc() 返回.如果 ptrNULL,则忽略对 free() 的调用.使用相同的 ptr 重复调用 free() 具有未定义的行为.

The CUDA in-kernel free() function deallocates the memory pointed to by ptr, which must have been returned by a previous call to malloc(). If ptr is NULL, the call to free() is ignored. Repeated calls to free() with the same ptr has undefined behavior.

给定 CUDA 线程通过 malloc() 分配的内存在 CUDA 上下文的生命周期内保持分配状态,或者直到通过调用 free() 显式释放内存为止代码>.它可以被任何其他 CUDA 线程使用,即使在随后的内核启动时也是如此.任何 CUDA 线程都可以释放由另一个线程分配的内存,但应注意确保同一指针不会被多次释放.

The memory allocated by a given CUDA thread via malloc() remains allocated for the lifetime of the CUDA context, or until it is explicitly released by a call to free(). It can be used by any other CUDA threads even from subsequent kernel launches. Any CUDA thread may free memory allocated by another thread, but care should be taken to ensure that the same pointer is not freed more than once.

推荐答案

根据http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/docs/CUDA_C_Programming_Guide.pdf 你应该可以使用 malloc() 和 free()设备功能.

According to http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/docs/CUDA_C_Programming_Guide.pdf you should be able to use malloc() and free() in a device function.

第 122 页

B.15 动态全局内存分配无效* malloc(size_t 大小);无效自由(无效* ptr);从全局内存中的固定大小堆动态分配和释放内存.

B.15 Dynamic Global Memory Allocation void* malloc(size_t size); void free(void* ptr); allocate and free memory dynamically from a fixed-size heap in global memory.

手册中给出的例子.

__global__ void mallocTest()
{
    char* ptr = (char*)malloc(123);
    printf("Thread %d got pointer: %p
", threadIdx.x, ptr);
    free(ptr);
}

void main()
{
    // Set a heap size of 128 megabytes. Note that this must
    // be done before any kernel is launched.
    cudaThreadSetLimit(cudaLimitMallocHeapSize, 128*1024*1024);
    mallocTest<<<1, 5>>>();
    cudaThreadSynchronize();
}

您需要编译器参数 -arch=sm_20 和支持 >2x 架构的卡.

You need the compiler paramter -arch=sm_20 and a card that supports >2x architecture.

这篇关于CUDA 在 __device__ 函数中分配内存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆