Fermi L2缓存命中延迟？ [英] Fermi L2 cache hit latency?

查看：122 发布时间：2017/3/4 13:00:51 cuda opencl gpu gpgpu

本文介绍了Fermi L2缓存命中延迟？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

有没有人知道Fermi中的L2缓存的相关信息？我听说它是像全局内存一样缓慢，使用L2只是为了放大内存带宽。但我找不到任何官方来源来证实这一点。有没有人测量L2的命中延迟？

实际上，L2读取未命中如何影响性能？在我的意义上，L2只有在非常记忆绑定的应用程序的意义。

解决方案

此 nvidia 中的主题有一些性能表现的测量。虽然它不是官方信息，可能不是100％准确，它至少给出了一些行为的指示，所以我认为这可能是有用的（在钟表周期的测量）：

1020非缓存（已启用L1但未使用）

1020未缓存（L1已禁用）
$ b
365 L2缓存（L1已禁用）

88 L1缓存（L1已启用并已使用）

< blockquote>

同一个线程中的另一个帖子给出了这些结果：

1060非缓存

248 L2

18 L1

Does anyone know related information about L2 cache in Fermi? I have heard that it is as slow as global memory, and the use of L2 is just to enlarge the memory bandwidth. But I can't find any official source to confirm this. Did anyone measure the hit latency of L2? What about size, line size, and other paramters?

In effect, how do L2 read misses affect the performance? In my sense, L2 only has a meaning in very memory-bound applications. Please feel free to give your opinions.

Thanks
解决方案
This thread in the nvidia has some measurements for performance characteristica. While it is not official information, and probably not 100% exact, it gives at least some indication for the behaviour, so I thought it might be useful here (measurements in clockcycles):

1020 non-cached (L1 enabled but not used)

1020 non-cached (L1 disabled)

365 L2 cached (L1 disabled)

88 L1 cached (L1 enabled and used)

Another post in the same thread gives those results:

1060 non-cached

248 L2

18 L1

这篇关于Fermi L2缓存命中延迟？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Fermi L2缓存命中延迟？ [英] Fermi L2 cache hit latency?

问题描述

相关文章

其它硬件开发最新文章

热门教程

热门工具

登录关闭

Fermi L2缓存命中延迟？ [英] Fermi L2 cache hit latency?

问题描述

相关文章

其它硬件开发最新文章

热门教程

热门工具

登录 关闭

登录关闭