NVIDIA Fermi中的二级缓存 [英] L2 cache in NVIDIA Fermi
问题描述
当查看NVIDIA Fermi架构中的性能计数器的名称(cuda的doc文件夹中的Compute_profiler.txt文件)时,我注意到对于L2缓存未命中,有两个性能计数器,即l2_subp0_read_sector_misses和l2_subp1_read_sector_misses。他们说这些是两片L2。
When looking at the name of the performance counters in NVIDIA Fermi architecture (the file Compute_profiler.txt in the doc folder of cuda), I noticed that for L2 cache misses, there are two performance counters, l2_subp0_read_sector_misses and l2_subp1_read_sector_misses. They said that these are for two slices of L2.
为什么它们有两片L2?与流多处理器体系结构有关系吗?
Why do they have two slices of L2? Is there any relation with the Streaming Multi-processor architecture? What would be the effect of this division to the performance?
谢谢
推荐答案
我认为与流式多处理器没有任何直接关系。
I don't think there is any direct relation with the streaming multiprocessor.
我只是认为该片相当于银行的内存。
I just think that slice is equivalent of bank memory.
只需将两个值相加即可得出总的 L2读取未命中率。
Just sum the values of the two to get the "total" L2 read misses.
这篇关于NVIDIA Fermi中的二级缓存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!