在不同线程上读取相同的内存位置时出错 [英] Error while reading same mem positions on different threads
问题描述
从不同的线程读取双精度数组中的几个位置时遇到问题.
我将执行加入队列:
nelements = nx*ny;
err = clEnqueueNDRangeKernel(queue,kernelTvl2of,1,NULL,&nelements,NULL,0,NULL,NULL);
kernelTvl2of具有(除其他外)代码
size_t k = get_global_id(0);
(...)
u1_[k] = (float)u1[k];
(...)
barrier(CLK_GLOBAL_MEM_FENCE);
forwardgradient(u1_,u1x,u1y,k,nx,ny);
barrier(CLK_GLOBAL_MEM_FENCE);
和正向渐变具有代码:
void forwardgradient(global double *f, global double *fx, global double *fy, int ker,int nx, int ny){
unsigned int rowsnotlast = ((nx)*(ny-1));
if(ker<rowsnotlast){
fx[ker] = f[ker+1] - f[ker];
fy[ker] = f[ker+nx] - f[ker];
}
if(ker<nx*ny){
fx[ker] = f[ker+1] - f[ker];
if(ker==4607){
fx[0] = f[4607];
fx[1] = f[4608];
fx[2] = f[4608] - f[4607];
fx[3] = f[ker];
fx[4] = f[ker+1];
fx[5] = f[ker+1] - f[ker];
}
}
if(ker==(nx*ny)-1){
fx[ker] = 0;
fy[ker] = 0;
}
if(ker%nx == nx-1){
fx[ker]=0;
}
fx[6] = f[4608];
}
当我得到fx的第一个位置的内容时,它们是:
-6 0 6 -6 0 6 -6
这是我的问题:当我在ID为4607的线程上查询fx [ker + 1]或fx [4608]时,得到一个"0"(输出数组的第二和第五位),但是从其他线程得到输出数组的最后一个"-6"位置
任何人都知道我在做什么错,或者我会去哪里找?
非常感谢
安东
在内核中,全局内存一致性只能在单个工作组中实现.这意味着,如果工作项将值写入全局内存,则barrier(CLK_GLOBAL_MEM_FENCE)
仅保证相同工作组中的其他工作项将能够读取更新的值.>
如果需要跨多个工作组的全局内存一致性,则需要将内核拆分为多个内核.
I have a problem while reading a couple of positions in a double array from different threads.
I enqueue the execution with :
nelements = nx*ny;
err = clEnqueueNDRangeKernel(queue,kernelTvl2of,1,NULL,&nelements,NULL,0,NULL,NULL);
kernelTvl2of has (among other) the code
size_t k = get_global_id(0);
(...)
u1_[k] = (float)u1[k];
(...)
barrier(CLK_GLOBAL_MEM_FENCE);
forwardgradient(u1_,u1x,u1y,k,nx,ny);
barrier(CLK_GLOBAL_MEM_FENCE);
and forwardgradient has the code:
void forwardgradient(global double *f, global double *fx, global double *fy, int ker,int nx, int ny){
unsigned int rowsnotlast = ((nx)*(ny-1));
if(ker<rowsnotlast){
fx[ker] = f[ker+1] - f[ker];
fy[ker] = f[ker+nx] - f[ker];
}
if(ker<nx*ny){
fx[ker] = f[ker+1] - f[ker];
if(ker==4607){
fx[0] = f[4607];
fx[1] = f[4608];
fx[2] = f[4608] - f[4607];
fx[3] = f[ker];
fx[4] = f[ker+1];
fx[5] = f[ker+1] - f[ker];
}
}
if(ker==(nx*ny)-1){
fx[ker] = 0;
fy[ker] = 0;
}
if(ker%nx == nx-1){
fx[ker]=0;
}
fx[6] = f[4608];
}
When I get the contents of the first positions of fx, they are:
-6 0 6 -6 0 6 -6
And here's my problem: when I query fx[ker+1] or fx[4608] on thread with id 4607 I get a '0' (positions second and fifth of the output array), but from other threads I get a '-6' last position of the output array)
Anyone has a clue on what I'm doing wrong, or where I could look to?
Thanks a lot,
Anton
Within a kernel, global memory consistency is only achievable within a single work-group. This means that if a work-item writes a value to global memory, a barrier(CLK_GLOBAL_MEM_FENCE)
only guarantees that other work-items within the same work-group will be able to read the updated value.
If you need global memory consistency across multiple work-groups, you need to split your kernel into multiple kernels.
这篇关于在不同线程上读取相同的内存位置时出错的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!