CUDA中的随机播放指令不起作用 [英] Shuffle instruction in CUDA not working

查看:105
本文介绍了CUDA中的随机播放指令不起作用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在CUDA 5.0中的随机播放指令有问题。

I have got problem with shuffle instruction in CUDA 5.0.

这是我的内核的片段。它在循环内。打印仅用于调试目的,因为我不能使用普通的调试器:

This is snippet of my kernel. It is inside the loop. Print is there only for debug purpose because I can't use ordinary debugger:

...
tex_val = tex2D(srcTexRef, threadIdx.x + w, y_pos);
if (threadIdx.x == 0)
{
    left = left_value[y_pos];
}
else
{
    printf("thread %d; shfl value: %f \n", threadIdx.x, __shfl_up(value, 1));
    left = __shfl_up(value, 1);
}

printf("thread %d; value: %f; tex_val: %f; left: %f \n", threadIdx.x, value, tex_val, left);
...

从中我得到以下输出:

l0:  ITERATION 1
l1:  thread 0; value: 0; tex_val: 1; left: 4
l2: 
l3:  ITERATION 2
l4:  thread 1; shfl value: 0
l5:  thread 0; value: 5; tex_val: 1; left: 5
l6:  thread 1; value: 0; tex_val: 1; left: 0
l7: 
l8:  ITERATION 3
l9:  thread 1; shfl value: 0
l10: thread 2; shfl value: 1
l11: thread 0; value: 6; tex_val: 1; left: 6
l12: thread 1; value: 1; tex_val: 1; left: 0
l13: thread 2; value: 2; tex_val: 1; left: 1
...

从输出中我可以看到线程1没有即使我可以清楚地看到它具有值(在第4行-shfl值为0;第5行-value是5),也不能在任何迭代中从线程0获取值。线程2和更高的线程可以从更低的线程获取值。我在哪里犯错?

From the output I can see that thread 1 doesn't get value from thread 0 in any iteration even though I can clearly see that it has value (line 4 - shfl value is 0; line 5 - value is 5). Thread 2 and higher can get value from lower thread. Where am I making mistake? Is it happening because of the branching?

推荐答案

是的,是因为分支的缘故。引用 CUDA编程指南B.14.2

Yes, it's because of the branching. Quoting from the CUDA programming guide B.14.2:


__ shfl()内在函数允许在warp中的线程之间交换变量,而无需使用共享内存。交换同时发生在扭曲内的所有活动线程上,...

The __shfl() intrinsics permit exchanging of a variable between threads within a warp without use of shared memory. The exchange occurs simultaneously for all active threads within the warp, ...


线程只能从正在积极参与 __ shfl()命令的另一个线程中读取数据。如果目标线程处于非活动状态,则检索到的值是不确定的。

Threads may only read data from another thread which is actively participating in the __shfl() command. If the target thread is inactive, the retrieved value is undefined.

在分支中,活动线程是采用相同执行路径的线程,而采用其他方法的人则处于非活动状态。在您的情况下,线程0处于非活动状态,因此您无法从中进行洗牌。

In a branch, active threads are those taking the same path of execution, while those taking different ones are inactive. In your case, thread 0 is inactive, so you cannot shuffle from it.

这篇关于CUDA中的随机播放指令不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆