CUDA 4.0对等访问混淆 [英] CUDA 4.0 Peer to Peer Access confusion

查看：129 发布时间：2017/3/4 15:56:49 cuda

本文介绍了CUDA 4.0对等访问混淆的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有两个与CUDA 4.0对等访问相关的问题：

I have two questions related to CUDA 4.0 Peer access:

有什么办法可以复制数据， code> GPU＃0 ---> GPU＃1 ---> GPU＃2 ---> GPU＃3 。目前在我的代码中它工作正常，当我只使用两个GPU的时候，但是失败，当我检查对等访问第三GPU使用 cudaDeviceCanAccessPeer 。所以，以下工作 -
cudaDeviceCanAccessPeer（& flag_01，dev0，dev1），但是当我有两个这样的语句：
cudaDeviceCanAccessPeer（& flag_01，dev0，dev1）和 cudaDeviceCanAccessPeer（& flag_12，dev1，dev2）

Is there any way I could copy data like from GPU#0 ---> GPU#1 ---> GPU#2 ---> GPU#3. Presently in my code it works fine when I use just two GPUs at a time, but fails when I check peer access on a third GPU using cudaDeviceCanAccessPeer. So, the following works - cudaDeviceCanAccessPeer(&flag_01, dev0, dev1), but when I have two such statements: cudaDeviceCanAccessPeer(&flag_01, dev0, dev1) and cudaDeviceCanAccessPeer(&flag_12, dev1, dev2), the later fails (0 is returned to the flag_12 variable).

它仅适用于通过通用PCIe连接的GPU吗？还是取决于底层PCIe互连的对等复制？我不明白PCIe，但是在做nvidia-smi时，我发现GPU的PCIe总线是2,3,83和84.

Would it work only for GPUs connected via a common PCIe OR is Peer copy dependent upon the underlying PCIe interconnection? I do not understand PCIe, but upon doing nvidia-smi I see that the PCIe buses of the GPUs are 2, 3, 83 and 84.

测试台是一个双插槽6核英特尔Westmere，有4个GPU - Nvidia Tesla C2050。

The testbed is a dual socket 6 core Intel Westmere, with 4 GPUs - Nvidia Tesla C2050.

编辑：
HtoD之间的带宽测试和DtoH，以及两个GPU（DtoD）之间的SimpleP2P结果：

Bandwidthtest between HtoD and DtoH, and SimpleP2P results between two GPUs (DtoD):

推荐答案

我怀疑这是问题。从即将到来的NVIDIA文档：

I suspect this is the problem. From an upcoming NVIDIA document:

NVIDIA GPU设计为充分利用PCI-e Gen2标准，包括对等通信，但IOH芯片组不支持与其他IOH芯片组进行P2P通信的完整PCI-e Gen2规范

NVIDIA GPUs are designed to take full advantage of the PCI-e Gen2 standard, including the Peer-to-Peer communication, but the IOH chipset does not support the full PCI-e Gen2 specification for P2P communication with other IOH chipsets

如果应用程序尝试建立一个cudaPeerEnable P2P关系，需要通过QPI的P2P通信。用于P2P直接传输的cudaMemcopy（）函数自动回退到使用设备到主机到设备路径，但是没有针对P2P直接访问（设备代码中的P2P加载/存储指令）的自动回退。

The cudaPeerEnable() API call will return an error code if the application tries to establish a P2P relationship between two GPUs that would require P2P communication over QPI. The cudaMemcopy() function for P2P Direct Transfers automatically falls back to using a Device-to-Host-to-Device path, but there is no automatic fallback for P2P Direct Access (P2P load/store instructions in device code).

一个已知的示例系统是具有双IOH芯片组的HP Z800工作站，可运行simpleP2P示例，但带宽非常低（100 MB / s，而不是几GB / s）因为回退路径。

One known example system is the HP Z800 workstation with dual IOH chipsets which can run the simpleP2P example, but bandwidth is very low (100s of MB/s instead of several GB/s) because of the fallback path.

NVIDIA正在调查是否可以通过向未来的GPU架构添加功能来支持跨QPI的GPU P2P。

NVIDIA is investigating whether GPU P2P across QPI can be supported by adding functionality to future GPU architectures.

参考：英特尔®5520芯片组和英特尔®5500芯片组数据表，表7-4：入站内存地址解码：
IOH不支持PCI Express的非连续字节使能，对等MMIO事务。这是对PCI Express标准要求的额外限制，以防止与Intel QuickPath互连不兼容。 - http://www.intel.com/Assets/PDF/datasheet/321328 .pdf

Reference: Intel® 5520 Chipset and Intel® 5500 Chipset Datasheet, Table 7-4: Inbound Memory Address Decoding: "The IOH does not support non-contiguous byte enables from PCI Express for remote peer-to-peer MMIO transactions. This is an additional restriction over the PCI Express standard requirements to prevent incompatibility with Intel QuickPath Interconnect". -- http://www.intel.com/Assets/PDF/datasheet/321328.pdf

一般来说，我们建议构建多GPU工作站和集群，这些工作站和集群具有用于连接到单个IOH的GPU的所有PCI Express Express插槽。

In general we advise building multi-GPU workstations and clusters that have all PCI-express slots intended for GPUs connected to a single IOH.

这篇关于CUDA 4.0对等访问混淆的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

CUDA 4.0对等访问混淆 [英] CUDA 4.0 Peer to Peer Access confusion

问题描述

推荐答案

相关文章

其它硬件开发最新文章

热门教程

热门工具

登录关闭

CUDA 4.0对等访问混淆 [英] CUDA 4.0 Peer to Peer Access confusion

问题描述

推荐答案

相关文章

其它硬件开发最新文章

热门教程

热门工具

登录 关闭

登录关闭