CUDA：每个多处理器和每个块的线程有多少线程？ [英] CUDA: What is the threads per multiprocessor and threads per block distinction?

查看：2735 发布时间：2017/3/4 12:43:40 cuda gpu gpgpu nvidia

本文介绍了CUDA：每个多处理器和每个块的线程有多少线程？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我们有一个工作站安装了两个Nvidia Quadro FX 5800卡。运行deviceQuery CUDA示例显示每个多处理器（SM）的最大线程数为1024，而每个块的最大线程数为512.

We have a workstation with two Nvidia Quadro FX 5800 cards installed. Running the deviceQuery CUDA sample reveals that the maximum threads per multiprocessor (SM) is 1024, while the maximum threads per block is 512.

由于只能执行一个块在每个SM一次，为什么最大线程/处理器是最大线程/块的两倍？我们如何利用每个SM的其他512个线程？

Given that only one block can be executed on each SM at a time, why is max threads / processor double the max threads / block? How do we utilise the other 512 threads per SM?

Device 1: "Quadro FX 5800"
  CUDA Driver Version / Runtime Version          5.0 / 5.0
  CUDA Capability Major/Minor version number:    1.3
  Total amount of global memory:                 4096 MBytes (4294770688 bytes)
  (30) Multiprocessors x (  8) CUDA Cores/MP:    240 CUDA Cores
  GPU Clock rate:                                1296 MHz (1.30 GHz)
  Memory Clock rate:                             800 Mhz
  Memory Bus Width:                              512-bit
  Max Texture Dimension Size (x,y,z)             1D=(8192), 2D=(65536,32768), 3D=(2048,2048,2048)
  Max Layered Texture Size (dim) x layers        1D=(8192) x 512, 2D=(8192,8192) x 512
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       16384 bytes
  Total number of registers available per block: 16384
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1024
  Maximum number of threads per block:           512
  Maximum sizes of each dimension of a block:    512 x 512 x 64
  Maximum sizes of each dimension of a grid:     65535 x 65535 x 1
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             256 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      No
  Device PCI Bus ID / PCI location ID:           4 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

干杯，
詹姆斯。

Cheers, James.

CUDA：每个多处理器和每个块的线程有多少线程？ [英] CUDA: What is the threads per multiprocessor and threads per block distinction?

问题描述

推荐答案

相关文章

其它硬件开发最新文章

热门教程

热门工具

登录关闭

CUDA：每个多处理器和每个块的线程有多少线程？ [英] CUDA: What is the threads per multiprocessor and threads per block distinction?

问题描述

推荐答案

相关文章

其它硬件开发最新文章

热门教程

热门工具

登录 关闭

登录关闭