用于 P2P 复制的 cudaMemcpy() 和 cudaMemcpyPeer() 有什么区别? [英] What is the difference between cudaMemcpy() and cudaMemcpyPeer() for P2P-copy?

查看:14
本文介绍了用于 P2P 复制的 cudaMemcpy() 和 cudaMemcpyPeer() 有什么区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在没有 CPU-RAM 的情况下直接将数据从 GPU0-DDR 复制到 GPU1-DDR.

I want to copy data from GPU0-DDR to GPU1-DDR directly without CPU-RAM.

如第 15 页所述:http://people.maths.ox.ac.uk/gilesm/cuda/MultiGPU_Programming.pdf

Peer-to-Peer Memcpy
 Direct copy from pointer on GPU A to pointer on GPU B

 With UVA, just use cudaMemcpy(…, cudaMemcpyDefault)
     Or cudaMemcpyAsync(…, cudaMemcpyDefault)

 Also non-UVA explicit P2P copies:
     cudaError_t cudaMemcpyPeer( void * dst, int dstDevice, const void* src, 
        int srcDevice, size_t count )
     cudaError_t cudaMemcpyPeerAsync( void * dst, int dstDevice,
        const void* src, int srcDevice, size_t count, cuda_stream_t stream = 0 )

  1. 如果我使用 cudaMemcpy() 那么我必须首先设置一个标志 cudaSetDeviceFlags( cudaDeviceMapHost ) 吗?
  2. 我是否必须使用从函数 cudaHostGetDevicePointer(& uva_ptr, ptr, 0) 得到的 cudaMemcpy() 指针?
  3. 功能有什么优点cudaMemcpyPeer(),如果没有任何优势,为什么需要它?
  1. If I use cudaMemcpy() then do I must at first to set a flag cudaSetDeviceFlags( cudaDeviceMapHost )?
  2. Do I have to use cudaMemcpy() pointers which I got as result from the function cudaHostGetDevicePointer(& uva_ptr, ptr, 0)?
  3. Are there any advantages of function cudaMemcpyPeer(), and if no any advantage, why it is needed?

推荐答案

统一虚拟寻址 (UVA) 为所有 CPU 和 GPU 内存启用一个地址空间,因为它允许根据指针值确定物理内存位置.

Unified Virtual Addressing (UVA) enables one address space for all CPU and GPU memories since it allows determining physical memory location from pointer value.

使用 UVA* 的点对点 memcpy

当 UVA 可行时,cudaMemcpy 可用于点对点 memcpy,因为 CUDA 可以推断哪个设备拥有"哪个内存.使用 UVA 执行点对点 memcpy 通常需要的说明如下:

When UVA is possible, then cudaMemcpy can be used for peer-to-peer memcpy since CUDA can infer which device "owns" which memory. The instructions you typically need to perform a peer-to-peer memcpy with UVA are the following:

//Check for peer access between participating GPUs: 
cudaDeviceCanAccessPeer(&can_access_peer_0_1, gpuid_0, gpuid_1);
cudaDeviceCanAccessPeer(&can_access_peer_1_0, gpuid_1, gpuid_0);

//Enable peer access between participating GPUs:
cudaSetDevice(gpuid_0);
cudaDeviceEnablePeerAccess(gpuid_1, 0);
cudaSetDevice(gpuid_1);
cudaDeviceEnablePeerAccess(gpuid_0, 0);

//UVA memory copy:
cudaMemcpy(gpu0_buf, gpu1_buf, buf_size, cudaMemcpyDefault);

没有 UVA 的点对点 memcpy

当 UVA 不可用时,点对点 memcpy 通过 cudaMemcpyPeer 完成.这是一个例子

When UVA is not possible, then peer-to-peer memcpy is done via cudaMemcpyPeer. Here is an example

// Set device 0 as current
cudaSetDevice(0); 
float* p0;
size_t size = 1024 * sizeof(float);
// Allocate memory on device 0
cudaMalloc(&p0, size); 
// Set device 1 as current
cudaSetDevice(1); 
float* p1;
// Allocate memory on device 1
cudaMalloc(&p1, size); 
// Set device 0 as current
cudaSetDevice(0);
// Launch kernel on device 0
MyKernel<<<1000, 128>>>(p0); 
// Set device 1 as current
cudaSetDevice(1); 
// Copy p0 to p1
cudaMemcpyPeer(p1, 1, p0, 0, size); 
// Launch kernel on device 1
MyKernel<<<1000, 128>>>(p1);

如您所见,在前一种情况下(可以使用 UVA)您不需要指定不同指针所指的设备,在后一种情况下(无法使用 UVA)您必须明确提及指针所指的设备参考.

As you can see, while in the former case (UVA possible) you don't need to specify which device the different pointers refer to, in the latter case (UVA not possible) you have to explicitly mention which device the pointers refer to.

说明

cudaSetDeviceFlags(cudaDeviceMapHost);

用于启用主机映射到设备内存,这是另一回事,涉及主机<->设备内存移动,而不是对等内存移动,这是您帖子的主题.

is used to enable host mapping to device memory, which is a different thing and regards host<->device memory movements and not peer-to-peer memory movements, which is the topic of your post.

总之,您的问题的答案是:

In conclusion, the answer to your questions are:

  1. 否;
  2. 否;
  3. 如果可能,启用 UVA 并使用 cudaMemcpy(您不需要指定设备);否则,请使用 cudaMemcpyPeer(您需要指定设备).
  1. NO;
  2. NO;
  3. When possible, enable UVA and use cudaMemcpy (you don't need to specify the devices); otherwise, use cudaMemcpyPeer (and you need to specify the devices).

这篇关于用于 P2P 复制的 cudaMemcpy() 和 cudaMemcpyPeer() 有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆