将cudaMalloc同步主机和设备？ [英] will cudaMalloc synchronize host and device?

查看：385 发布时间：2017/3/4 14:47:48 cuda

本文介绍了将cudaMalloc同步主机和设备？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我知道cudaMemcpy会同步主机和设备，但是如何cudaMalloc或cudaFree？

I understand that cudaMemcpy will synchronize host and device, but how about cudaMalloc or cudaFree?

基本上我想异步同步内存分配/复制和内核执行多GPU设备，我的代码的简化版本是这样的：

Basically I want to asynchronize memory allocation/copy and kernel executions on multiple GPU devices, and a simplified version of my code is something like this:

void wrapper_kernel(const int &ngpu, const float * const &data)
{
 cudaSetDevice(ngpu);
 cudaMalloc(...);
 cudaMemcpyAsync(...);
 kernels<<<...>>>(...);
 cudaMemcpyAsync(...);
 some host codes;
}

int main()
{
 const int NGPU=3;
 static float *data[NGPU];
 for (int i=0; i<NGPU; i++) wrapper_kernel(i,data[i]);
 cudaDeviceSynchronize();
 some host codes;
}

但是，GPU正在顺序运行，无法找到原因。 / p>

However, the GPUs are running sequentially, and can't find why.

推荐答案

尝试对每个GPU使用 cudaStream_t 下面是从CUDA示例中获取的simpleMultiGPU.cu。

Try using cudaStream_t for each GPU. Below is simpleMultiGPU.cu taken from CUDA sample.

 //Solver config                                                          
TGPUplan      plan[MAX_GPU_COUNT];
//GPU reduction results                                                                                   
float     h_SumGPU[MAX_GPU_COUNT];

....memory init....

//Create streams for issuing GPU command asynchronously and allocate memory (GPU and System page-locked)                             for (i = 0; i < GPU_N; i++)
{
    checkCudaErrors(cudaSetDevice(i));
    checkCudaErrors(cudaStreamCreate(&plan[i].stream));
    //Allocate memory                                                                                                                    checkCudaErrors(cudaMalloc((void **)&plan[i].d_Data, plan[i].dataN * sizeof(float)));
    checkCudaErrors(cudaMalloc((void **)&plan[i].d_Sum, ACCUM_N * sizeof(float)));
    checkCudaErrors(cudaMallocHost((void **)&plan[i].h_Sum_from_device, ACCUM_N * sizeof(float)));
    checkCudaErrors(cudaMallocHost((void **)&plan[i].h_Data, plan[i].dataN * sizeof(float)));

    for (j = 0; j < plan[i].dataN; j++)
    {
        plan[i].h_Data[j] = (float)rand() / (float)RAND_MAX;
    }
}

....kernel, memory copyback....

和这里一些使用指南多gpu。

and here's some guide of using multi gpu.

这篇关于将cudaMalloc同步主机和设备？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

将cudaMalloc同步主机和设备？ [英] will cudaMalloc synchronize host and device?

问题描述

推荐答案

相关文章

其它硬件开发最新文章

热门教程

热门工具

登录关闭

将cudaMalloc同步主机和设备？ [英] will cudaMalloc synchronize host and device?

问题描述

推荐答案

相关文章

其它硬件开发最新文章

热门教程

热门工具

登录 关闭

登录关闭