CUDA,NPP过滤器 [英] CUDA, NPP Filters
问题描述
CUDA NPP库支持使用nppiFilter_8u_C1R命令过滤图像,但仍会收到错误。我没有问题,让boxFilterNPP示例代码启动并运行。
eStatusNPP = nppiFilterBox_8u_C1R(oDeviceSrc.data(),oDeviceSrc.pitch (),
oDeviceDst.data(),oDeviceDst.pitch(),
oSizeROI,oMaskSize,oAnchor);但是如果我改变它使用nppiFilter_8u_C1R,eStatusNPP会返回错误-24(NPP_TEXTURE_BIND_ERROR)。下面的代码是我对原始boxFilterNPP示例的更改。 NppiSize oMaskSize = {5,5};
npp :: ImageCPU_32s_C1 hostKernel(5,5);
for(int x = 0; x <5; x ++){
for(int y = 0; y <5; y ++){
hostKernel.pixels x,y)[0] .x = 1;
}
}
npp :: ImageNPP_32s_C1 pKernel(hostKernel);
Npp32s nDivisor = 1;
eStatusNPP = nppiFilter_8u_C1R(oDeviceSrc.data(),oDeviceSrc.pitch(),
oDeviceDst.data(),oDeviceDst.pitch(),
oSizeROI,
pKernel.data(),
oMaskSize,oAnchor,
nDivisor);
这已在CUDA 4.2和5.0上尝试过,结果相同。
当oMaskSize = {1,1}
解决方案当我将内核存储为 ImageCPU
/ ImageNPP
时,同样的问题。
一个好的解决方案是将内核作为传统1D数组存储在设备上。我试过这个,它给了我很好的结果(和没有那些不可预测或垃圾图像)。
感谢Frank Jargstorff在这个StackOverflow post为一维的想法。
NppiSize oMaskSize = {5,5};
Npp32s hostKernel [5 * 5];
for(int x = 0; x <5; x ++){
for(int y = 0; y <5; y ++){
hostKernel [x * 5 + y] = 1;
}
}
Npp32s * pKernel; //只是GPU上的常规1D数组
cudaMalloc((void **)& pKernel,5 * 5 * sizeof(Npp32s));
cudaMemcpy(pKernel,hostKernel,5 * 5 * sizeof(Npp32s),cudaMemcpyHostToDevice);
使用这个原始图像,这里是模糊的结果,我从您的代码与1D内核数组:
我使用的其他参数:
Npp32s nDivisor = 25;
NppiPoint oAnchor = {4,4};
The CUDA NPP library supports filtering of image using the nppiFilter_8u_C1R command but keep getting errors. I have no problem getting the boxFilterNPP sample code up and running.
eStatusNPP = nppiFilterBox_8u_C1R(oDeviceSrc.data(), oDeviceSrc.pitch(),
oDeviceDst.data(), oDeviceDst.pitch(),
oSizeROI, oMaskSize, oAnchor);
But if I change it to use nppiFilter_8u_C1R instead, eStatusNPP return the error -24 (NPP_TEXTURE_BIND_ERROR). The code below is the alterations I made to the original boxFilterNPP sample.
NppiSize oMaskSize = {5,5};
npp::ImageCPU_32s_C1 hostKernel(5,5);
for(int x = 0 ; x < 5; x++){
for(int y = 0 ; y < 5; y++){
hostKernel.pixels(x,y)[0].x = 1;
}
}
npp::ImageNPP_32s_C1 pKernel(hostKernel);
Npp32s nDivisor = 1;
eStatusNPP = nppiFilter_8u_C1R(oDeviceSrc.data(), oDeviceSrc.pitch(),
oDeviceDst.data(), oDeviceDst.pitch(),
oSizeROI,
pKernel.data(),
oMaskSize, oAnchor,
nDivisor);
This have been tried on CUDA 4.2 and 5.0, with same result.
The code runs with the expected result when oMaskSize = {1,1}
解决方案 I had the same problem when I stored my kernel as an ImageCPU
/ImageNPP
.
A good solution is to store the kernel as a traditional 1D array on the device. I tried this, and it gave me good results (and none of those unpredictable or garbage images).
Thanks to Frank Jargstorff in this StackOverflow post for the 1D idea.
NppiSize oMaskSize = {5,5};
Npp32s hostKernel[5*5];
for(int x = 0 ; x < 5; x++){
for(int y = 0 ; y < 5; y++){
hostKernel[x*5+y] = 1;
}
}
Npp32s* pKernel; //just a regular 1D array on the GPU
cudaMalloc((void**)&pKernel, 5 * 5 * sizeof(Npp32s));
cudaMemcpy(pKernel, hostKernel, 5 * 5 * sizeof(Npp32s), cudaMemcpyHostToDevice);
Using this original image, here's the blurred result that I get from your code with the 1D kernel array:
Other parameters that I used:
Npp32s nDivisor = 25;
NppiPoint oAnchor = {4, 4};
这篇关于CUDA,NPP过滤器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!