为什么CUDA中的常量内存大小有限？ [英] Why is the constant memory size limited in CUDA?

查看：762 发布时间：2017/3/4 12:21:32 memory cuda gpu

本文介绍了为什么CUDA中的常量内存大小有限？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

根据CUDA C编程指南，只有在命中多处理器常量缓存（第5.3.2.4节）¹时，常量内存访问才有好处。否则，与在合并的全局存储器读取的情况下相比，可以存在对于半变形的甚至更多的存储器请求。那么为什么常量内存大小限制为64 KB？

According to "CUDA C Programming Guide", a constant memory access benefits only if a multiprocessor constant cache is hit (Section 5.3.2.4)¹. Otherwise there can be even more memory requests for a half-warp than in case of the coalesced global memory read. So why the constant memory size is limited to 64 KB?

还有一个问题，为了不要问两次。据我所知，在费米架构中，纹理缓存与L2缓存结合。纹理使用仍然有意义，或者全局内存读取以相同的方式缓存？

One more question in order not to ask twice. As far as I understand, in the Fermi architecture the texture cache is combined with the L2 cache. Does texture usage still make sense or the global memory reads are cached in the same manner?

¹

常量内存位于设备内存中，在F.3.1和F.4.1节中提到的常量缓存中。

The constant memory space resides in device memory and is cached in the constant cache mentioned in Sections F.3.1 and F.4.1.

对于计算能力为1.x的设备，对warp的恒定内存请求首先分为

For devices of compute capability 1.x, a constant memory request for a warp is first split into two requests, one for each half-warp, that are issued independently.

然后，将一个请求分成与初始请求中存在不同内存地址的单独请求一样多的请求，将吞吐量降低等于分离请求数的因子。

A request is then split into as many separate requests as there are different memory addresses in the initial request, decreasing throughput by a factor equal to the number of separate requests.

然后，在缓存命中的情况下，以常量缓存的吞吐量处理所得到的请求，或者。

The resulting requests are then serviced at the throughput of the constant cache in case of a cache hit, or at the throughput of device memory otherwise.

为什么CUDA中的常量内存大小有限？ [英] Why is the constant memory size limited in CUDA?

问题描述

推荐答案

相关文章

其它硬件开发最新文章

热门教程

热门工具

登录关闭

为什么CUDA中的常量内存大小有限？ [英] Why is the constant memory size limited in CUDA?

问题描述

推荐答案

相关文章

其它硬件开发最新文章

热门教程

热门工具

登录 关闭

登录关闭