CUDA中的迭代图像处理 [英] Iterative image processing in CUDA

查看:166
本文介绍了CUDA中的迭代图像处理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写了一个CUDA内核来处理图像。但是根据处理的图像的输出,我必须再次调用内核,重新调整图像。
例如,让我们考虑具有 9 像素的图像

  1 2 3 
4 5 6
7 8 9

,取决于其相邻值,值 9 更改为 10 。由于值已更改,因此必须使用相同的内核重新处理新图片。

  1 2 3 
4 5 6
7 8 10

我已经写了算法来处理图像在单次迭代。我计划在CUDA中实现迭代的方法如下:

  __ global__ void process_image_GPU(unsigned int * d_input,unsigned int * d_output,int dataH,int dataW,unsigned int * val){

__shared__ unsigned int sh_map [TOTAL_WIDTH] [TOTAL_WIDTH];
//处理
//如果在处理过程中,任何线程改变图像调用的值
{atomicAdd(val,1); }

}
int main(int argc,char * argv []){
//分配d_input,d_output并调用cudaMemcpy
unsigned int * x, val;
x =(unsigned int *)malloc(sizeof(unsigned int));
x [0] = 0;
cudaMalloc((void **)& val,sizeof(unsigned int));
cudaMemcpy((void *)val,(void *)x,sizeof(unsigned int),cudaMemcpyHostToDevice);
process_image_GPU<<< dimGrid,dimBlock>>>(d_input,d_output,rows,cols,val);
cudaMemcpy((void *)x,(void *)val,sizeof(unsigned int),cudaMemcpyDeviceToHost);
if(x!= 0)
//再次调用内核
}

这是唯一的方法吗?是否还有其他有效的方法来实现?



非常感谢您的时间。

解决方案

我有一个答案,尽管你提供的信息几乎消失。希望它有帮助。



根据您的说法,您已经根据相邻像素的值为像素设置了更新规则。让 x ^(k)_ij 在迭代k处的像素数 ij 的值, b = b

  x ^(k + 1)_ij = f(x ^(k)_(i-1) x b(x)(k)_i(j + 1))

我假设使用典型的基于模板的更新规则,但当然也可以使用其他规则。



此时,您必须设置停止规则,即指示您的算法是否已达到收敛的规则。例如,您可以在步骤 k + 1 k 评估两个图像之间的差异的范数。 / p>

一旦以这种方式制定了问题,我会说你有以下两种可能性:


  1. Rouy- Tourin-like 方案:所有计算像素以强制方式同时更新直到达到收敛为止;

  2. :计算网格沿预定数目的方向扫描(选择性更新),直到收敛为止

    根据你处理的问题类型,我会说你有additionall的可能性:


    1. 快速迭代法:借助堆结构有选择地更新计算像素。

    所有上述方法比较,对于eikonal方程的解,这里



    当然,需要针对我们感兴趣的特定问题显示上述计算方案的汇集。


    I have written a CUDA kernel to process an image. But depending on the output of the processed image, I have to call the kernel again, to re-tune the image. For example, let us consider an image having 9 pixels

    1 2 3
    4 5 6
    7 8 9 
    

    Suppose that, depending on its neighboring values, the value 9 changes to 10. Since the value has changed, I have to re-process the new image, with the same kernel.

    1 2 3
    4 5 6
    7 8 10
    

    I have already written the algorithm to process the image in a single iteration. The way I'm planning to implement the iterations in CUDA is the following:

    __global__ void process_image_GPU(unsigned int *d_input, unsigned int *d_output, int dataH, int dataW, unsigned int *val) {
    
         __shared__ unsigned int sh_map[TOTAL_WIDTH][TOTAL_WIDTH];
         // Do processing
         // If during processing, anywhere any thread changes the value of the image call
                { atomicAdd(val, 1); }
    
    }
    int main(int argc, char *argv[]) {
        // Allocate d_input, d_output and call cudaMemcpy
        unsigned int *x, *val;
        x = (unsigned int *)malloc(sizeof(unsigned int));
        x[0] = 0;
        cudaMalloc((void **)&val, sizeof(unsigned int));
        cudaMemcpy((void *)val, (void *)x, sizeof(unsigned int), cudaMemcpyHostToDevice);
        process_image_GPU<<<dimGrid, dimBlock>>>(d_input, d_output, rows, cols, val);
        cudaMemcpy((void *)x, (void *)val, sizeof(unsigned int), cudaMemcpyDeviceToHost);
        if(x != 0) 
            // Call the kernel again
    }
    

    Is it the only way to do this? Is there any other efficient way to implement the same?

    Thanks a lot for your time.

    解决方案

    I hazard an answer, despite the almost vanishing information you provided. Hope it helps.

    From what you have said, you have already set up an updating rule for your pixels, based on the value of the adjacent pixels. Let x^(k)_ij the value of the pixel number ij at iteration k and let

    x^(k+1)_ij = f(x^(k)_(i-1)j, x^(k)_ij, x^(k)_(i+1)j, x^(k)_i(j-1), x^(k)_i(j+1))
    

    I'm assuming the typical stencil-based updating rule, but of course other rules would be possible.

    At this point, you have to set up a stopping rule, namely, a rule that indicates if your algorithm has reached convergence. For example, you could evaluate the norm of the difference between the two images at steps k+1 and k.

    Once formulated the problem in this way, I would say that you have the following two possibilities:

    1. Rouy-Tourin-like scheme: all the computational pixels are updated in a brute-force way "simultaneously" until convergence is reached;
    2. Fast sweeping method: the computational grid is swept (selective update) along a prefixed number of directions until convergence is reached;

    Depending on the kind of problem you are dealing with, I would say that you have the additionl possibility:

    1. Fast iterative method: the computational pixels are selectively updated with the aid of a heap structure.

    All the above methods have been compared, for the solution of the eikonal equation, here.

    Of course, you will need to show converngence of the above computational schemes for the particular problem of our interest.

    这篇关于CUDA中的迭代图像处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆