OpenCL的有效的方式来组下三角矩阵 [英] OpenCL efficient way to group a lower triangular matrix
问题描述
我敢肯定有人遇到过这个问题之前,我基本上都在尺寸2D优化电网N×M的,与约束n_i< = m_i,即我只想计算的下三角部分的对矩阵。此刻,我天真地只实现并购工作组A N本地组的所有N×M的组合(然后用localGroupID和workGroupID识别对),然后返回-inf如果约束无法保存计算。
I'm sure someone has come across this problem before, basically I have a 2D optimisation grid NxM in size, with the constraint that n_i <= m_i , i.e I only want to calculate the pairs in the lower triangular section of the matrix. At the moment I naively just implement all NxM combinations in a N local groups of M work groups (and then use localGroupID and workGroupID to identify the pair), and then return -inf if the constraint fails to save computation.
但有一个更好的办法来设置线程和索引他们,所以我只需要生成的(NXM)/ 2个线程,而不是完整的N×M的。
But is there a better way to set up the threads and index them so I only need to generated (NXM)/2 threads rather than the full NxM.
非常感谢
山姆
Many thanks Sam
推荐答案
当然,这只是几何。任何直角三角形可以划分成与同区域的矩形。只是在半切片水平和垂直和重新组装件背部成长方形。在执行方面,让你的全球工作大小等于三角形和高度等于三角形高度的一半的宽度。在内核中,如果X坐标超过一半的宽度,检查(X - 一半)> y和这样的话X =宽度 - X - 1和Y = Y + half_height。你必须沿着边界线的一些分歧,但你不会离开你一半的工作项目处于闲置状态。
Of course, it's just geometry. Any right triangle can be divided up into a rectangle with the same area. Just slice it in half horizontally and vertically and re-assemble the pieces back into a rectangle. In terms of implementation, make your global work size equal to the width of the triangle and the height equal to half the triangle height. In the kernel, if the x coordinate is more than half the width, check if (x - half) > y and if so then x = width - x - 1 and y = y + half_height. You'll have some thread divergence along the boundary, but you won't leave half your work items idle.
这篇关于OpenCL的有效的方式来组下三角矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!