CUDA中的随机播放指令不起作用 [英] Shuffle instruction in CUDA not working
问题描述
我在CUDA 5.0中的随机播放指令有问题。
I have got problem with shuffle instruction in CUDA 5.0.
这是我的内核的片段。它在循环内。打印仅用于调试目的,因为我不能使用普通的调试器:
This is snippet of my kernel. It is inside the loop. Print is there only for debug purpose because I can't use ordinary debugger:
...
tex_val = tex2D(srcTexRef, threadIdx.x + w, y_pos);
if (threadIdx.x == 0)
{
left = left_value[y_pos];
}
else
{
printf("thread %d; shfl value: %f \n", threadIdx.x, __shfl_up(value, 1));
left = __shfl_up(value, 1);
}
printf("thread %d; value: %f; tex_val: %f; left: %f \n", threadIdx.x, value, tex_val, left);
...
从中我得到以下输出:
l0: ITERATION 1
l1: thread 0; value: 0; tex_val: 1; left: 4
l2:
l3: ITERATION 2
l4: thread 1; shfl value: 0
l5: thread 0; value: 5; tex_val: 1; left: 5
l6: thread 1; value: 0; tex_val: 1; left: 0
l7:
l8: ITERATION 3
l9: thread 1; shfl value: 0
l10: thread 2; shfl value: 1
l11: thread 0; value: 6; tex_val: 1; left: 6
l12: thread 1; value: 1; tex_val: 1; left: 0
l13: thread 2; value: 2; tex_val: 1; left: 1
...
从输出中我可以看到线程1没有即使我可以清楚地看到它具有值(在第4行-shfl值为0;第5行-value是5),也不能在任何迭代中从线程0获取值。线程2和更高的线程可以从更低的线程获取值。我在哪里犯错?
From the output I can see that thread 1 doesn't get value from thread 0 in any iteration even though I can clearly see that it has value (line 4 - shfl value is 0; line 5 - value is 5). Thread 2 and higher can get value from lower thread. Where am I making mistake? Is it happening because of the branching?
推荐答案
是的,是因为分支的缘故。引用 CUDA编程指南B.14.2 :
Yes, it's because of the branching. Quoting from the CUDA programming guide B.14.2:
__ shfl()
内在函数允许在warp中的线程之间交换变量,而无需使用共享内存。交换同时发生在扭曲内的所有活动线程上,...
The
__shfl()
intrinsics permit exchanging of a variable between threads within a warp without use of shared memory. The exchange occurs simultaneously for all active threads within the warp, ...
和
线程只能从正在积极参与
__ shfl()
命令的另一个线程中读取数据。如果目标线程处于非活动状态,则检索到的值是不确定的。
Threads may only read data from another thread which is actively participating in the
__shfl()
command. If the target thread is inactive, the retrieved value is undefined.
在分支中,活动线程是采用相同执行路径的线程,而采用其他方法的人则处于非活动状态。在您的情况下,线程0处于非活动状态,因此您无法从中进行洗牌。
In a branch, active threads are those taking the same path of execution, while those taking different ones are inactive. In your case, thread 0 is inactive, so you cannot shuffle from it.
这篇关于CUDA中的随机播放指令不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!