CUDA warp中的线程是否在多处理器上并行执行? [英] Do the threads in a CUDA warp execute in parallel on a multiprocessor?
问题描述
经线有32个线程。 32个线程是否在多处理器中并行执行?
如果32个线程没有并行执行,那么warp中没有竞争条件。
如果多个线程写入同一位置(共享内存或全局内存),如果不想竞赛,则必须使用原子操作或锁,因为CUDA编程模型不能保证哪个线程要写。
A warp is 32 threads. Does the 32 threads execute in parallel in a Multiprocessor? If 32 threads are not executing in parallel then there is no race condition in the warp. I got this doubt after going through the some examples.
In the CUDA programming model, all the threads within a warp run in parallel. But the actual execution in hardware may not be parallel because the number of cores within a SM (Stream Multiprocessor) can be less than 32. For example, GT200 architecture have 8 cores per SM, and the threads within a warp would need 4 clock cycles to finish the execution.
If multiple threads write to the same location (either shared memory or global memory), and if you don't want race, then you have to use atomic operations or locks, because CUDA programming model does not guarantee which thread is going to write.
这篇关于CUDA warp中的线程是否在多处理器上并行执行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!