对于哪些大小是普通加载和存储到全局内存CUDA原子? [英] For which sizes are plain loads and store to global memory in CUDA atomic?

查看:218
本文介绍了对于哪些大小是普通加载和存储到全局内存CUDA原子?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果满足以下条件,则对CUDA中的全局内存进行常规读取和写入操作:

Are general reads and writes to global memory atomic in CUDA if:


  • 这是一个4字节指令? (我假设是)

  • 它是一个8字节或16字节指令? (我认为是)

至少在Kepler和Fermi上4个字节读取和写入全局内存原子在Warp级别或8 / 16字节指令原子在半/季度上翘曲级别如果:

Are at least on Kepler and Fermi general 4 byte reads and writes to global memory atomic on Warp level or 8/16 Byte instructions atomic on half/quater Warp level if:


  • 所有warp线程访问相同的32字节L2事务块? (我假设是)

  • Warp线程访问不同的32字节L2事务块,但所有warp线程访问相同的128字节L2高速缓存线? (我假设不是)

  • 所有warp线程访问不同的L2缓存线? (我假设没有)

如果关于翘曲水平的原子性的任何假设是正确的,是否有任何利用这种知识的方法

If any of those assumptions about the atomicness on warp level is correct, is there any method of harnessing this knowledge without risking the compability to future Compute Capabilites?

推荐答案

TL; DR:由于缓存,原子性不能保证当

TL;DR: due to caching, atomicity is not guaranteed when loading or storing anything larger than 1 byte.

解释:

读取和写入通常采用地方相对于缓存。在向全局内存发出事务时,除非使用 atomic 指令,否则不能保证CUDA编程或内存模型中的原子性。

Reads and writes generally take place with respect to the caches. By the time the transactions are issued to global memory, there is no guarantee of atomicity in the CUDA programming or memory model, unless atomic instructions are used.

例如,假设线程块中的线程更新Kepler中L2中的4字节数量。现在,另一个线程,在另一个warp,threadblock,或内核可以更新这些4个字节中的一个,在L2中,高速缓存行被逐出全局内存。到高速缓存行被驱逐到全局内存时,它可能不会表示由原始线程或第二个线程写入的内容(例如,如果第三次写入...)。

For example, suppose a thread in a threadblock updates a 4-byte quantity in L2 on Kepler. Now, another thread, in another warp, threadblock, or kernel could update just one of those 4 bytes, in the L2, before that cacheline gets evicted to global memory. By the time the cacheline gets evicted to global memory, it may not represent what was written either by the original thread or even the second thread (for example if a third write came along...).

请记住,L2是一个回写缓存,不能被禁用,并且不被全局读写忽略,除非 atomic 指令。

Keep in mind the L2 is a write-back cache, cannot be disabled, and is not bypassed by global reads and writes, except in the case of atomic instructions.

这篇关于对于哪些大小是普通加载和存储到全局内存CUDA原子?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆