我应该使用什么设备号(0或1),复制P2P(GPU0-> GPU1)? [英] What device number should I use (0 or 1), to copy P2P (GPU0->GPU1)?

查看:669
本文介绍了我应该使用什么设备号(0或1),复制P2P(GPU0-> GPU1)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在 cudaSetDevice(); 中设置0或1,才能使用 GPU1) > cudaStreamCreate(stream); cudaMemcpyPeerAsync(p1,1,p0,0,size,stream); ?

What number of device do I must to set 0 or 1 in cudaSetDevice();, to copy P2P (GPU0->GPU1) by using cudaStreamCreate(stream); cudaMemcpyPeerAsync(p1, 1, p0, 0, size, stream); ?

代码:

// Set device 0 as current
cudaSetDevice(0); 
float* p0;
size_t size = 1024 * sizeof(float);
// Allocate memory on device 0
cudaMalloc(&p0, size); 
// Set device 1 as current
cudaSetDevice(1); 
float* p1;
// Allocate memory on device 1
cudaMalloc(&p1, size); 
// Set device 0 as current
cudaSetDevice(0);
// Launch kernel on device 0
MyKernel<<<1000, 128>>>(p0); 

// What number do I must to set 0 or 1?
cudaSetDevice(1); // cudaSetDevice(0); 
cudaStream_t stream;
cudaStreamCreate(stream);

// Copy p0 to p1
cudaMemcpyPeerAsync(p1, 1, p0, 0, size, stream); 
cudaStreamSynchronize(stream);

// Launch kernel on device 1
cudaSetDevice(1); 
MyKernel<<<1000, 128>>>(p1);

UPDATE 31.03.2014:
仅用于 __ global__ kernel_function(),而不是 cudaMemcpyPeerAsync()?对于 cudaMemcpyAsync() cudaMemcpyPeerAsync()

UPDATE 31.03.2014: Or does the current context important only for __global__ kernel_function(), not for cudaMemcpyPeerAsync()? And for cudaMemcpyAsync() and cudaMemcpyPeerAsync() is only important that stream has been created for the device from (source pointer) which the data is copied, isn't it?

推荐答案

在调用 cudaMemcpyPeerAsync 您可以指定非默认。所以你的第一个问题是:在调用 cudaMemcpyPeerAsync 之前, cudaSetDevice em>

In the call to cudaMemcpyPeerAsync you can specify a non-default stream. So your first question is: which device should I set by cudaSetDevice before the call to cudaMemcpyPeerAsync?

答案是你必须通过 cudaSetDevice 设置 stream 已创建。您可以使用为源或目标设备创建的 。尽管据我所知,在文档,这种可能性可以由Robert Crovella的回答推断如何在cudaMemcpyPeerAsync()中定义目标设备流?。请注意,自2011年起,根据多GPU编程,当属于源GPU时,性能最大化。

The answer is that you have to set, by cudaSetDevice, the device for which the stream has been created. You can either use a stream created for the source or for the destination device. Although, at the best of my knowledge, not explicitly mentioned in the documentation, this possibility can be inferred by Robert Crovella's answer to How to define destination device stream in cudaMemcpyPeerAsync()?. Please, note that, as of 2011 and according to Multi-GPU Programming, performance is maximized when stream belongs to the source GPU.

让我回顾一些重要的点当在多GPU的框架中使用 streams 时,借用多GPU编程,并支持上述语句:

Let me recall some important points when using streams in the framework of multi-GPU, borrowed from Multi-GPU Programming, and which support the above statements:


  1. CUDA streams 是每个设备;

  2. 是由GPU在创建时确定的;

  3. 调用只有在其设备
  4. 是当前的。
  1. CUDA streams are per device;
  2. streams are determined by the GPU that was current at the time of their creation;
  3. Calls to a stream can be issued only when its device is current.

这篇关于我应该使用什么设备号(0或1),复制P2P(GPU0-> GPU1)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆