CUDA 4.0对等访问混淆 [英] CUDA 4.0 Peer to Peer Access confusion

查看:129
本文介绍了CUDA 4.0对等访问混淆的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个与CUDA 4.0对等访问相关的问题:

I have two questions related to CUDA 4.0 Peer access:


  1. 有什么办法可以复制数据, code> GPU#0 ---> GPU#1 ---> GPU#2 ---> GPU#3 。目前在我的代码中它工作正常,当我只使用两个GPU的时候,但是失败,当我检查对等访问第三GPU使用 cudaDeviceCanAccessPeer 。所以,以下工作 -
    cudaDeviceCanAccessPeer(& flag_01,dev0,dev1),但是当我有两个这样的语句:
    cudaDeviceCanAccessPeer(& flag_01,dev0,dev1) cudaDeviceCanAccessPeer(& flag_12,dev1,dev2)

  1. Is there any way I could copy data like from GPU#0 ---> GPU#1 ---> GPU#2 ---> GPU#3. Presently in my code it works fine when I use just two GPUs at a time, but fails when I check peer access on a third GPU using cudaDeviceCanAccessPeer. So, the following works - cudaDeviceCanAccessPeer(&flag_01, dev0, dev1), but when I have two such statements: cudaDeviceCanAccessPeer(&flag_01, dev0, dev1) and cudaDeviceCanAccessPeer(&flag_12, dev1, dev2), the later fails (0 is returned to the flag_12 variable).

它仅适用于通过通用PCIe连接的GPU吗?还是取决于底层PCIe互连的对等复制?我不明白PCIe,但是在做nvidia-smi时,我发现GPU的PCIe总线是2,3,83和84.

Would it work only for GPUs connected via a common PCIe OR is Peer copy dependent upon the underlying PCIe interconnection? I do not understand PCIe, but upon doing nvidia-smi I see that the PCIe buses of the GPUs are 2, 3, 83 and 84.

测试台是一个双插槽6核英特尔Westmere,有4个GPU - Nvidia Tesla C2050。

The testbed is a dual socket 6 core Intel Westmere, with 4 GPUs - Nvidia Tesla C2050.

编辑:
HtoD之间的带宽测试和DtoH,以及两个GPU(DtoD)之间的SimpleP2P结果:

Bandwidthtest between HtoD and DtoH, and SimpleP2P results between two GPUs (DtoD):

推荐答案

我怀疑这是问题。从即将到来的NVIDIA文档:

I suspect this is the problem. From an upcoming NVIDIA document:

NVIDIA GPU设计为充分利用PCI-e Gen2标准,包括对等通信,但IOH芯片组不支持与其他IOH芯片组进行P2P通信的完整PCI-e Gen2规范

NVIDIA GPUs are designed to take full advantage of the PCI-e Gen2 standard, including the Peer-to-Peer communication, but the IOH chipset does not support the full PCI-e Gen2 specification for P2P communication with other IOH chipsets

如果应用程序尝试建立一个cudaPeerEnable P2P关系,需要通过QPI的P2P通信。用于P2P直接传输的cudaMemcopy()函数自动回退到使用设备到主机到设备路径,但是没有针对P2P直接访问(设备代码中的P2P加载/存储指令)的自动回退。

The cudaPeerEnable() API call will return an error code if the application tries to establish a P2P relationship between two GPUs that would require P2P communication over QPI. The cudaMemcopy() function for P2P Direct Transfers automatically falls back to using a Device-to-Host-to-Device path, but there is no automatic fallback for P2P Direct Access (P2P load/store instructions in device code).

一个已知的示例系统是具有双IOH芯片组的HP Z800工作站,可运行simpleP2P示例,但带宽非常低(100 MB / s,而不是几GB / s)因为回退路径。

One known example system is the HP Z800 workstation with dual IOH chipsets which can run the simpleP2P example, but bandwidth is very low (100s of MB/s instead of several GB/s) because of the fallback path.

NVIDIA正在调查是否可以通过向未来的GPU架构添加功能来支持跨QPI的GPU P2P。

NVIDIA is investigating whether GPU P2P across QPI can be supported by adding functionality to future GPU architectures.

参考:英特尔®5520芯片组和英特尔®5500芯片组数据表,表7-4:入站内存地址解码:
IOH不支持PCI Express的非连续字节使能,对等MMIO事务。这是对PCI Express标准要求的额外限制,以防止与Intel QuickPath互连不兼容。 - http://www.intel.com/Assets/PDF/datasheet/321328 .pdf

Reference: Intel® 5520 Chipset and Intel® 5500 Chipset Datasheet, Table 7-4: Inbound Memory Address Decoding: "The IOH does not support non-contiguous byte enables from PCI Express for remote peer-to-peer MMIO transactions. This is an additional restriction over the PCI Express standard requirements to prevent incompatibility with Intel QuickPath Interconnect". -- http://www.intel.com/Assets/PDF/datasheet/321328.pdf

一般来说,我们建议构建多GPU工作站和集群,这些工作站和集群具有用于连接到单个IOH的GPU的所有PCI Express Express插槽。

In general we advise building multi-GPU workstations and clusters that have all PCI-express slots intended for GPUs connected to a single IOH.

这篇关于CUDA 4.0对等访问混淆的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆