带浮点的自定义内核GpuMat [英] Custom Kernel GpuMat with float
问题描述
我正在尝试使用GpuMat
数据编写自定义内核,以查找图像像素的反余弦值.当GPU具有CV_8UC1
数据但不能使用char来计算反余弦值时,我可以上传,下载和更改值.但是,当我尝试将我的GPU转换为CV_32FC1
类型(浮点型)时,在下载部分会出现非法的内存访问错误.这是我的代码:
I'm trying to write a custom kernel using GpuMat
data to find the arc cosine of an image's pixels. I can upload, download, and change values when I upload data when the GPU has CV_8UC1
data but chars cannot be used to calculate arc cosines. However, when I try to convert my GPU to CV_32FC1
type (floats) I get an illegal memory access error during the download part. Here is my code:
//.cu code
#include <cuda_runtime.h>
#include <stdlib.h>
#include <iostream>
#include <stdio.h>
__global__ void funcKernel(const float* srcptr, float* dstptr, size_t srcstep, const size_t dststep, int cols, int rows){
int rowInd = blockIdx.y*blockDim.y+threadIdx.y;
int colInd = blockIdx.x*blockDim.x+threadIdx.x;
if(rowInd >= rows || colInd >= cols)
return;
const float* rowsrcptr=srcptr+rowInd*srcstep;
float* rowdstPtr= dstptr+rowInd*dststep;
float val = rowsrcptr[colInd];
if((int) val % 90 == 0)
rowdstPtr[colInd] = -1 ;
else{
float acos_val = acos(val);
rowdstPtr[colInd] = acos_val;
}
}
int divUp(int a, int b){
return (a+b-1)/b;
}
extern "C"
{
void func(const float* srcptr, float* dstptr, size_t srcstep, const size_t dststep, int cols, int rows){
dim3 blDim(32,8);
dim3 grDim(divUp(cols, blDim.x), divUp(rows,blDim.y));
std::cout << "calling kernel from func\n";
funcKernel<<<grDim,blDim>>>(srcptr,dstptr,srcstep,dststep,cols,rows);
std::cout << "done with kernel call\n";
cudaDeviceSynchronize();
}
//.cpp code
void callKernel(const GpuMat &src, GpuMat &dst){
float* p = (float*)src.data;
float* p2 =(float*) dst.data;
func(p,p2,src.step,dst.step,src.cols,src.rows);
}
int main(){
Mat input = imread("cat.jpg",0);
Mat float_input;
input.convertTo(float_input,CV_32FC1);
GpuMat d_frame,d_output;
Size size = float_input.size();
d_frame.upload(float_input);
d_output.create(size,CV_32FC1);
callKernel(d_frame,d_output);
Mat output(d_output);
return 0;
}
当我运行程序时,编译器会告诉我这一点:
When I run the program my compiler tells me this:
OpenCV错误:Gpu API调用(遇到非法的内存访问) 复制,文件 /home/mobile/opencv-2.4.9/modules/dynamicuda/include/opencv2/dynamicuda/dynamicuda.hpp, 第882行在抛出的实例后终止 'cv :: Exception'what(): /home/mobile/opencv-2.4.9/modules/dynamicuda/include/opencv2/dynamicuda/dynamicuda.hpp:882: 错误:(-217)在函数中遇到了非法的内存访问 复制
OpenCV Error: Gpu API call (an illegal memory access was encountered) in copy, file /home/mobile/opencv-2.4.9/modules/dynamicuda/include/opencv2/dynamicuda/dynamicuda.hpp, line 882 terminate called after throwing an instance of 'cv::Exception' what(): /home/mobile/opencv-2.4.9/modules/dynamicuda/include/opencv2/dynamicuda/dynamicuda.hpp:882: error: (-217) an illegal memory access was encountered in function copy
推荐答案
您正在将图像step
视为是float
偏移量.它是从一行到另一行的字节偏移.
You are treating image step
as if it is a float
offset. It is a byte offset from one row to the next.
尝试这样的方法:
const float* rowsrcptr= (const float *)(((char *)srcptr)+rowInd*srcstep);
float* rowdstPtr= (float *) (((char *)dstptr)+rowInd*dststep);
来自文档:
step –每个矩阵行占用的字节数.
step – Number of bytes each matrix row occupies.
It's also a good idea to add proper cuda error checking to your code (e.g. to func
). And you can run your code with cuda-memcheck
to see the actual kernel failure generating the invalid reads/writes.
这篇关于带浮点的自定义内核GpuMat的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!