是GPU开普勒CC3.0处理器不仅是流水线架构,而且还是超标量? [英] Are GPU Kepler CC3.0 processors not only pipelined architecture, but also superscalar?

查看:318
本文介绍了是GPU开普勒CC3.0处理器不仅是流水线架构,而且还是超标量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在CUDA 6.5的文档中已经写道: http ://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#ixzz3PIXMTktb


5.2.3。多处理器级别



...




  • x,因为多处理器在一个时钟周期内发出
    a 一对指令,用于在
    a时间的四个warp,如Compute Capability 3.x中所述。


这是不是意味着GPU Kepler CC3.0处理器不仅仅是流水线架构,


  1. 管道 - 这两个序列并行执行(一次执行不同的操作):




    • LOAD [addr1] - > ADD - > STORE [addr1] - > NOP

    • NOP - > LOAD [addr2] - > ADD - > STORE [ addr2]


  2. 超级标准 - 这两个序列并行执行(一次执行相同的操作):




    • LOAD [reg1] - > ADD - > STORE [reg1]

    • LOAD [reg2] - > ADD - > STORE [reg2]



解决方案,Kepler中的经线调度器可以每个时钟调度两个指令,只要:


  1. 指令是独立的

  2. 指令来自相同的经线

  3. 在SM中有足够的执行资源用于这两个指令

如果这符合你对superscalar的定义,那么它是超标量。



对于流水线,我看到流水线不同。 Kepler SM中的各种执行单元都是流水线。让我们以浮点乘法为例。



在给定时钟中,Kepler warp调度器可以在浮点单元上调度浮点乘法运算。此操作的结果可能在稍后几个时钟周期内不会出现(即它们在下一个时钟周期不可用),但在下一个时钟周期,可以安排浮点运算<因为硬件(在这种情况下是浮点单元)是流水线。



pre> 时钟操作流水线级结果
0 MPY1 - > PS1
1 PS2
... ...
N-1 PSN - > result1

在时钟0之后的下一个时钟,可以在同一个HW上调度一个新的乘法指令,并且相应的结果将出现在 result1 出现后的下一个周期。



不知道这是什么意味着一次不同的操作


In the documentation for CUDA 6.5 has written: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#ixzz3PIXMTktb

5.2.3. Multiprocessor Level

...

  • 8L for devices of compute capability 3.x since a multiprocessor issues a pair of instructions per warp over one clock cycle for four warps at a time, as mentioned in Compute Capability 3.x.

Does this mean that the GPU Kepler CC3.0 processors are not only pipelined architecture, but also superscalar?

  1. Pipelining - these two sequences execute in parallel (different operations at one time):

    • LOAD [addr1] -> ADD -> STORE [addr1] -> NOP
    • NOP -> LOAD [addr2] -> ADD -> STORE [addr2]
  2. Superscalar - these two sequences execute in parallel (the same operations at one time):

    • LOAD [reg1] -> ADD -> STORE [reg1]
    • LOAD [reg2] -> ADD -> STORE [reg2]

解决方案

Yes, the warp schedulers in Kepler can schedule two instructions per clock, as long as:

  1. the instructions are independent
  2. the instructions come from the same warp
  3. there are sufficient execution resources in the SM for both instructions

If that fits your definition of superscalar, then it is superscalar.

With respect to pipelining, I view pipelining differently. Various execution units in Kepler SM are pipelined. Let's take a floating point multiply as an example.

In a given clock, a Kepler warp scheduler may schedule a floating point multiply operation on a floating-point unit. The results of this operation may not appear for some number of clocks later, (i.e. they are not available on the next clock cycle) but on the next clock cycle, a new floating point operation can be scheduled on the very same floating point functional units, because the hardware (floating point units, in this case) is pipelined.

clock    operation    pipeline stage   result
0           MPY1   ->   PS1
1                       PS2
...                     ...
N-1                     PSN         ->  result1

on the very next clock after clock 0, a new multiply instruction can be scheduled on the same HW, and the corresponding result will appear on the next cycle after result1 appears.

Not sure if this is what you meant by "different operations at one time"

这篇关于是GPU开普勒CC3.0处理器不仅是流水线架构,而且还是超标量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆