通过自定义内核更改cuda :: GpuMat值 [英] Change cuda::GpuMat values through custom kernel

查看:178
本文介绍了通过自定义内核更改cuda :: GpuMat值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用内核来循环"运行.在实时摄像机流上突出显示特定的颜色区域.不能总是使用某些 cv :: threshold s重建它们,因此我正在使用内核.

当前内核如下:

  __ global__ void customkernel(无符号字符*输入,无符号字符*输出,整数宽度,整数高度,整数colorWidthStep,整数outputWidthStep){const int xIndex = blockIdx.x * blockDim.x + threadIdx.x;const int yIndex = blockIdx.y * blockDim.y + threadIdx.y;if(((xIndex< width)&&(; yIndex< height)){const int color_tid = yIndex * colorWidthStep +(3 * xIndex);const int output_tid = yIndex * outputWidthStep +(3 * xIndex);const unsigned char red = input [color_tid + 0];const unsigned char green = input [color_tid + 1];const unsigned char blue = input [color_tid + 2];如果(!(红色> 100&&蓝色< 50&&红色> 1.0 *绿色)){output [output_tid] = 255;output [output_tid + 1] = 255;output [output_tid + 2] = 255;} 别的 {output [output_tid] = 0;output [output_tid + 1] = 0;output [output_tid + 2] = 0;}}} 

此内核在这里被调用:

 外部"C";void myFunction(cv :: cuda :: GpuMat& input,cv :: cuda :: GpuMat& output){//计算输入和输出图像的总字节数const int colorBytes = input.step * input.rows;const int outputBytes = output.step * output.rows;无符号字符* d_input,* d_output;//分配设备内存SAFE_CALL(cudaMalloc< unsigned char>(& d_input,colorBytes),"CUDA Malloc失败");SAFE_CALL(cudaMalloc< unsigned char>(& d_output,outputBytes),"CUDA Malloc失败");//将数据从OpenCV输入图像复制到设备内存SAFE_CALL(cudaMemcpy(d_input,input.ptr(),colorBytes,cudaMemcpyHostToDevice),"CUDA Memcpy主机到设备失败");//指定合理的块大小const dim3 block(16,16);//计算网格尺寸以覆盖整个图像const dim3 grid((input.cols + block.x-1)/block.x,(input.rows + block.y-1)/block.y);//启动颜色转换内核custom_kernel<< grid,block>>(d_input,d_output,input.cols,input.rows,input.step,output.step);//同步以检查是否有任何内核启动错误SAFE_CALL(cudaDeviceSynchronize(),内核启动失败");//将数据从目标设备存储器复制回OpenCV输出映像SAFE_CALL(cudaMemcpy(output.ptr(),d_output,outputBytes,cudaMemcpyDeviceToHost),"CUDA Memcpy Host to Device Failed");//释放设备内存SAFE_CALL(cudaFree(d_input),"CUDA Free Failed");SAFE_CALL(cudaFree(d_output),"CUDA Free Failed");} 

我提供了一个示例图像,该图像显示了一辆红色汽车上的内核结果.如您所见,即使我尝试访问RGB/BGR值并将其设置为零或255,也有垂直的红线.

我以以下内容作为开始,但是我觉得 cv :: Mat cv :: cuda :: GpuMat 不会以相同的方式保存它们的值.我读到有关GpuMat的数据仅具有ptr的信息,并认为它将与 blockIdx blockDim 参数一起使用.

解决方案

@sgarizvi 在评论中提到 cv :: cuda :: GpuMat 已经位于Gpu中,因此我不得不使用 cudaMemcpyDeviceToDevice 而不是 cudaMemcpyHostToDevice .

也不必分配新的内存,这是通过删除上面代码的 cudaMalloc cudaFree 部分实现的.

最后(在这种情况下,可能与其他情况有所不同),我的图像输入是来自StereoLabs的Zed 2,它以 RGBA 发布其图像,因此内存中的顺序为R->;G->B->A,转换为OpenCV是B->G->-A,即每个像素4步:

  const int color_tid = yIndex * colorWidthStep +(4 * xIndex);const int output_tid = yIndex * outputWidthStep +(4 * xIndex); 

因此,要正确处理每个像素,必须将指针增加xIndex的四倍,如果只有BGR/RGB图像,则要使用三倍;如果是灰度图像,则要使用三遍.

I am using a kernel to "loop" over a live camera stream to highlight specific color regions. These can not always be reconstructed with some cv::thresholds, therefor I am using a kernel.

The current kernel is as following:

__global__ void customkernel(unsigned char* input, unsigned char* output, int width, int height, int colorWidthStep, int outputWidthStep) {
    const int xIndex = blockIdx.x * blockDim.x + threadIdx.x;
    const int yIndex = blockIdx.y * blockDim.y + threadIdx.y;

    if ((xIndex < width) && (yIndex < height)) {
        const int color_tid = yIndex * colorWidthStep + (3*xIndex);
        const int output_tid = yIndex * outputWidthStep + (3*xIndex);
        const unsigned char red   = input[color_tid+0];
        const unsigned char green = input[color_tid+1];
        const unsigned char blue  = input[color_tid+2];
        if (!(red > 100 && blue < 50 && red > 1.0*green)) {
            output[output_tid] = 255;
            output[output_tid+1] = 255; 
            output[output_tid+2] = 255;
        } else {
            output[output_tid] = 0;
            output[output_tid+1] = 0;
            output[output_tid+2] = 0;
        }
    }
}

This kernel gets called here:

extern "C" void myFunction(cv::cuda::GpuMat& input, cv::cuda::GpuMat& output) {
    // Calculate total number of bytes of input and output image
    const int colorBytes = input.step * input.rows;
    const int outputBytes = output.step * output.rows;

    unsigned char *d_input, *d_output;

    // Allocate device memory
    SAFE_CALL(cudaMalloc<unsigned char>(&d_input,colorBytes),"CUDA Malloc Failed");
    SAFE_CALL(cudaMalloc<unsigned char>(&d_output,outputBytes),"CUDA Malloc Failed");

    // Copy data from OpenCV input image to device memory
    SAFE_CALL(cudaMemcpy(d_input,input.ptr(),colorBytes,cudaMemcpyHostToDevice),"CUDA Memcpy Host To Device Failed");

    // Specify a reasonable block size
    const dim3 block(16,16);

    // Calculate grid size to cover the whole image
    const dim3 grid((input.cols + block.x - 1)/block.x, (input.rows + block.y - 1)/block.y);

    // Launch the color conversion kernel
    custom_kernel<<<grid,block>>>(d_input,d_output,input.cols,input.rows,input.step,output.step);

    // Synchronize to check for any kernel launch errors
    SAFE_CALL(cudaDeviceSynchronize(),"Kernel Launch Failed");

    // Copy back data from destination device meory to OpenCV output image
    SAFE_CALL(cudaMemcpy(output.ptr(),d_output,outputBytes,cudaMemcpyDeviceToHost),"CUDA Memcpy Host To Device Failed");

    // Free the device memory
    SAFE_CALL(cudaFree(d_input),"CUDA Free Failed");
    SAFE_CALL(cudaFree(d_output),"CUDA Free Failed");
}

I included an example image that shows the result of the kernel on a red car. As you can see there are vertical red lines, even though I tried to access RGB/BGR values and set them either to zero or 255.

I used the following as a start, but I feel like cv::Mat and cv::cuda::GpuMat do not save their values in the same way. I read about GpuMat only having a ptr to its data, and thought that it would be used with the blockIdx, blockDim parameters. https://github.com/sshniro/opencv-samples/blob/master/cuda-bgr-grey.cpp

Specific questions:

  1. What is the reason for the red lines?

  2. How can I change the RGB values correctly?

I am using Cuda 10.2 on Ubuntu 18.04 on a NVidia Xavier NX.

As mentioned in the comments I changed the parameters of the cudaMemcpy function and deleted the cudaMalloc and cudaFree parts. Additionally I reminded myself, that OpenCV stores color in BGR, so I changed the (+0,+1,+2) inside the kernel. And I loaded the red car directly via cv::imread, to exclude any previous formatting errors. Too great success, the kernel works.

解决方案

As mentioned by @sgarizvi in the comments the cv::cuda::GpuMat already resides in the Gpu, so I had to use cudaMemcpyDeviceToDevice instead of cudaMemcpyHostToDevice.

It was also not necessary to allocate new memory, which was achieved deleting the cudaMalloc and cudaFree parts of the code above.

At last (just in this case, might be different for others) my Image input was the Zed 2 from StereoLabs, which publishes its images in RGBA so the order inside the memory is R -> G -> B -> A, converted to OpenCV it is B -> G -> R -> A which are 4 steps per pixel:

const int color_tid = yIndex * colorWidthStep + (4*xIndex);
const int output_tid = yIndex * outputWidthStep + (4*xIndex);

So to correctly adress each pixel you have to increase the pointer by four times the xIndex, use three times if you just have a BGR/RGB image or once if it is grayscale.

这篇关于通过自定义内核更改cuda :: GpuMat值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆