我可以使用Quadro K4000和K2000进行GPUDirect v2对等(P2P)通信吗? [英] Can I use Quadro K4000 and K2000 for GPUDirect v2 Peer-to-peer (P2P) communictation?

查看:560
本文介绍了我可以使用Quadro K4000和K2000进行GPUDirect v2对等(P2P)通信吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用:


  • 单CPU (Intel Core i7-4820K Ivy Bridge-E) 40道PCIe 3.0 + MotherBoard MSI X79A-GD65(8D)

  • WindowsServer 2012,MSVS 2012 + CUDA 5.5并编译为 64位应用 li>
  • GPUs nVidia Quadro K4000和K2000

  • TCC模式

  • nVidia视频驱动程序332.50

  • Single CPU (Intel Core i7-4820K Ivy Bridge-E) 40 Lanes of PCIe 3.0 + MotherBoard MSI X79A-GD65 (8D)
  • WindowsServer 2012, MSVS 2012 + CUDA 5.5 and compiled as 64-bit application
  • GPUs nVidia Quadro K4000 and K2000
  • All Quadros in TCC-mode (Tesla Compute Cluster)
  • nVidia Video Driver 332.50

simpleP2P-测试显示,所有Quadros K4000和K4000 - 能够进行对等(P2P),但是对等(P2P)访问的Quadro K4000(GPU0) - Quadro K2000(GPU1):No。

simpleP2P-test shown that, all Quadros K4000 and K4000 - IS capable of Peer-to-Peer (P2P), but Peer-to-Peer (P2P) access - Quadro K4000 (GPU0) <-> Quadro K2000 (GPU1) : No.

我可以使用Quadro K4000和K2000进行 GPUDirect v2点对点(P2P)通信< a>?

Can I use Quadro K4000 and K2000 for GPUDirect v2 Peer-to-peer (P2P) communication?


[C:\ProgramData\NVIDIA Corporation \CUDA
Samples \v5.5 \ 0_Simple\simpleP2P ../../ bi
n / win64 / Release / simpleP2P.exe] - 正在启动...正在检查多个
GPU ...支持CUDA的设备数:3

[C:\ProgramData\NVIDIA Corporation\CUDA Samples\v5.5\0_Simple\simpleP2P../../bi n/win64/Release/simpleP2P.exe] - Starting... Checking for multiple GPUs... CUDA-capable device count: 3


GPU0 =Quadro K4000能够进行对等(P2P)

GPU0 = " Quadro K4000" IS capable of Peer-to-Peer (P2P)

GPU1 =Quadro K2000IS能够进行对等(P2P)

GPU1 = " Quadro K2000" IS capable of Peer-to-Peer (P2P)

GPU2 =GeForce GT 640无法进行点对点)

GPU2 = " GeForce GT 640" NOT capable of Peer-to-Peer (P2P)

检查GPU是否支持对等内存访问...

Checking GPU(s) for support of peer to peer memory access...


来自Quadro K4000(GPU0) - > Quadro K2000(GPU1)的对等(P2P)访问:否

Peer-to-Peer (P2P) access from Quadro K4000 (GPU0) -> Quadro K2000 (GPU1) : No

从Quadro K2000(GPU1)到Quadro K4000(GPU0)的对等(P2P)访问:否

Peer-to-Peer (P2P) access from Quadro K2000 (GPU1) -> Quadro K4000 (GPU0) : No

更多SM 2.0类GPU是必需的C:\ProgramData \NVIDIA
公司\CUDA
Samples \v5.5 \0_Simple\simpleP2P ../../ bin / win64 /释放/ simpleP2P.exe
到r un。支持UVA需要具有SM 2.0功能的GPU。对等体
到对等体访问在GPU0←GPU1之间不可用,放弃测试。

Two or more SM 2.0 class GPUs are required for C:\ProgramData\NVIDIA Corporation \CUDA Samples\v5.5\0_Simple\simpleP2P../../bin/win64/Release/simpleP2P.exe to r un. Support for UVA requires a GPU with SM 2.0 capabilities. Peer to Peer access is not available between GPU0 <-> GPU1, waiving test.

nvidia-smi.exe"
Tue Mar 11 12:43:05 2014
+------------------------------------------------------+
| NVIDIA-SMI 5.320.57   Driver Version: 320.57         |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro K2000        TCC  | 0000:01:00.0     Off |                  N/A |
| 30%   30C    P8    N/A /  N/A |        6MB /  2047MB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GT 640     WDDM  | 0000:02:00.0     N/A |                  N/A |
| 40%   32C  N/A     N/A /  N/A |     2016MB /  2047MB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
|   2  Quadro K4000        TCC  | 0000:03:00.0     Off |                  N/A |
| 30%   36C    P8    10W /  87W |        8MB /  3071MB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Compute processes:                                               GPU Memory |
|  GPU       PID  Process name                                     Usage      |
|=============================================================================|
|    1            Not Supported                                               |

在文档中说: https://developer.nvidia.com/gpudirect


GPUDirect 消除了不必要的系统内存副本,显着
降低了CPU开销,并降低了延迟,从而在NVIDIA Tesla™上运行的应用程序
的数据传输时间中获得了显着的
性能提升,

GPUDirect eliminates unnecessary system memory copies, dramatically lowers CPU overhead, and reduces latency, resulting in significant performance improvements in data transfer times for applications running on NVIDIA Tesla™ and Quadro™ products.

Quadros的更详细规格,但只有 GPUDirect For Video ,以及关于P2P的任何内容: http://www.nvidia.com/content/PDF/line_card/6660-nv-prographicssolutions-linecard-july13-final- lr.pdf

More detailed specifications of Quadros there, but there are only about GPUDirect For Video, and nothing about P2P: http://www.nvidia.com/content/PDF/line_card/6660-nv-prographicssolutions-linecard-july13-final-lr.pdf

关于PCIe bus:

About PCIe bus:

nvidia-smi -q
GPU 0000:01:00.0
    Product Name                    : Quadro K2000
    PCI
        Bus                         : 0x01
        Device                      : 0x00
        Domain                      : 0x0000
        Device Id                   : 0x0FFE10DE
        Bus Id                      : 0000:01:00.0
        Sub System Id               : 0x094C10DE
        GPU Link Info
            PCIe Generation
                Max                 : 2
                Current             : 1
            Link Width
                Max                 : 16x
                Current             : 8x
    FB Memory Usage
        Total                       : 2047 MiB
        Used                        : 6 MiB
        Free                        : 2041 MiB
    BAR1 Memory Usage
        Total                       : 256 MiB
        Used                        : 2 MiB
        Free                        : 254 MiB
    Compute Mode                    : Default
...

GPU 0000:02:00.0
    Product Name                    : GeForce GT 640
    PCI
        Bus                         : 0x02
        Device                      : 0x00
        Domain                      : 0x0000
        Device Id                   : 0x0FC110DE
        Bus Id                      : 0000:02:00.0
        Sub System Id               : 0x8A921462
        GPU Link Info
            PCIe Generation
                Max                 : N/A
                Current             : N/A
            Link Width
                Max                 : N/A
                Current             : N/A

...

GPU 0000:03:00.0
    Product Name                    : Quadro K4000
    PCI
        Bus                         : 0x03
        Device                      : 0x00
        Domain                      : 0x0000
        Device Id                   : 0x11FA10DE
        Bus Id                      : 0000:03:00.0
        Sub System Id               : 0x097C10DE
        GPU Link Info
            PCIe Generation
                Max                 : 2
                Current             : 1
            Link Width
                Max                 : 16x
                Current             : 16x
    FB Memory Usage
        Total                       : 3071 MiB
        Used                        : 8 MiB
        Free                        : 3063 MiB
    BAR1 Memory Usage
        Total                       : 256 MiB
        Used                        : 2 MiB
        Free                        : 254 MiB
    Compute Mode                    : Default

我可以使用GPUDirect v2 P2P与Quadros,如果可以,然后在哪些?
应该是BAR1的大小等于能够使用P2P的GPU-RAM的大小吗?

Can I use GPUDirect v2 P2P with Quadros, and if I can, then in which of these? Should be size of the BAR1 is equal to the size of GPU-RAM to be able to use P2P?

更新11.03.2014 23: 16


  1. 无法使用P2P直接转移在PCIe-gen2 8x(理论上为4 GB /秒)上使用3 GB /秒成功运行 cudaMemcpy(gpu_ptr1,gpu_ptr0,cudaMemcpyDefault); __ global__内核(char *)可以使用P2P直接访问。 dst,char * src,size_t size){int idx = blockIdx.x * blockDim.x + threadIdx.x; dst [idx] = src [idx]; } - 当使用函数 cudaDeviceEnablePeerAccess()时获取错误,并在使用时获得 0 cudaDeviceCanAccessPeer()

  1. I can't use P2P Direct Transfers - I transfered random generated data by using cudaMemcpy(gpu_ptr1, gpu_ptr0, cudaMemcpyDefault); successfully with 3 GB/sec on PCIe-gen2 8x (4 GB/sec theoretically), but function copies through the host - In VisualProfiler Context1(DtoH) and Context2(HtoD).
  2. I can't use P2P Direct Access by using __global__ Kernel(char *dst, char *src, size_t size) { int idx = blockIdx.x * blockDim.x + threadIdx.x; dst[idx] = src[idx]; } - I get an error when use function cudaDeviceEnablePeerAccess() and get 0 when using cudaDeviceCanAccessPeer()


推荐答案

I不知道是否与您的问题相关,但请注意:

I don't know if it's related with your problem, but note this:

    GPU Link Info
        PCIe Generation
            Max                 : 2
            Current             : 1
        Link Width
            Max                 : 16x
            Current             : 8x

和此:

        PCIe Generation
            Max                 : 2
            Current             : 1
        Link Width
            Max                 : 16x
            Current             : 16x

也就是说,您的PCIe链接已从2.0(5 GT / s)降级到1.0(2.5 GT / s),并在一张卡从16x到8x ...这是非常可能的,这是一个问题的GPU直接但是肯定不是你想要的,为了挤压你的PCIe的所有性能(在一张卡上你获得25%的理论,50%在另一个)。

that is, your PCIe links have been demoted from 2.0 (5 GT/s) to 1.0 (2.5 GT/s) and on one card from 16x to 8x... it's very possible that this is a problem for GPU direct too, but for sure it's not what you want, in order to squeeze all the performance from your PCIe (on one card you're getting 25% of the theoretical, 50% on the other one).

我发现,把牌放在马蹄板上的顺序很重要;过热可能会导致公共汽车降级,或灰尘行星太可能....

I have found that it's important the order where the card are put on the mothorboard; overheating can lead to downgrade of the buses too, or dust... planets alignment too probably....

编辑:我不知道TCC是强制性的GPU直接工作,所以以下是无效的。

I didn't know that TCC was mandatory for GPU direct to work, so the following is not valid.

首先我试图删除显示卡,看看是否只使用两个quadro卡,你得到所有PCIe 2.0 / 16x,以及在这种情况下GPU直接是否开始工作。

编辑:从您的附加信息:在主板上的显示器必须连接到卡的第一个插槽(其中有16个PCIe-Lanes),然后我有:16x-GeForce,16x-Quadro K4000和8x-Quadro K2000

from your additional information: "and because in motherboard the monitor must be connected to the card in first slot (which with 16 PCIe-Lanes), then I have: 16x-GeForce, 16x-Quadro K4000 and 8x-Quadro K2000"

幸运的是,这不是真的(或者至少不是您的母亲手册):

Well fortunately it's not true (or at least, is not what is reported in the manual of your motherboad):

因此,将显示器连接的正确位置是插槽PCI_E6,8x一个。

So the correct place to attach the monitor to is to slot PCI_E6, the 8x one... good luck swapping cards.

恭喜你的问题是这么精确 - 这帮助了很多(注意 - 仍然不知道它是否解决...告诉我们!)。

Congrats for your question being so precise - that's helped a lot (note - still don't know if it solve... keep us informed!).

这篇关于我可以使用Quadro K4000和K2000进行GPUDirect v2对等(P2P)通信吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆