CUDA编程 - L1和L2高速缓存 [英] CUDA programming - L1 and L2 caches

查看:1090
本文介绍了CUDA编程 - L1和L2高速缓存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您能解释在使用L1和L2缓存或只有L2缓存在CUDA编程中的区别吗?在时间执行中我应该得到什么?我什么时候可以期望更小的gpu时间?当我启用L1和L2缓存或只启用L2?感谢

Could you please explain the differences between using both "L1 and L2" caches or "only L2" cache in CUDA programming? What should I expect in time execution? When could I expect smaller gpu time? When I enable both L1 and L2 caches or just enable L2? thanks

推荐答案

通常,您将保留启用L1和L2缓存。您应该尽可能多地合并您的内存访问,即warp内的线程应尽可能访问同一128B段内的数据(请参阅 CUDA编程指南)。

Typically you would leave both L1 and L2 caches enabled. You should try to coalesce your memory accesses as much as possible, i.e. threads within a warp should access data within the same 128B segment as much as possible (see the CUDA Programming Guide for more info on this topic).

某些程序无法优化这种方式,它们的存储器访问例如是完全随机的。对于这些情况,旁路L1缓存可能是有益的,从而避免加载整个128B线,当你只想要,例如,4个字节(你仍然会加载32B,因为那是最小)。显然有效率增益:来自128的4个有用字节从32提高到4。

Some programs are unable to be optimised in this manner, their memory accesses are completely random for example. For those cases it may be beneficial to bypass the L1 cache, thereby avoiding loading an entire 128B line when you only want, for example, 4 bytes (you'll still load 32B since that is the minimum). Clearly there is an efficiency gain: 4 useful bytes from 128 is improved to 4 from 32.

这篇关于CUDA编程 - L1和L2高速缓存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆