OPENCV:CUDA上下文初始化为不同的方法 [英] OPENCV : CUDA context initialization for different methods
问题描述
我正在开发一个简单的c ++程序来评估一些Opencv GPU方法(cv :: cuda)的性能。
我在Unbuntu 15(使用CUDA 7.5)和GeForce 770上使用Opencv 3.1。
I'm working on a simple c++ program to evaluate the performance of some Opencv GPU methods (cv::cuda). I am using Opencv 3.1 on Unbuntu 15 (with CUDA 7.5) with a GeForce 770.
我以前读过,我们需要初始化CUDA环境,过程在第一次调用。所以,我用cv :: cuda :: getDevice()和setDevice()初始化我的程序。
I previously read that we need to initialize CUDA environnement to avoid slow process at first call. So, I initialize my program with a cv::cuda::getDevice() and setDevice().
然后,我测试2个方法:
- cv :: cuda :: resize()(factor 0.5)
- 和cv :: cuda :: meanStdDev。
Then, I test 2 methods : - cv::cuda::resize() (factor 0.5) - and cv::cuda::meanStdDev.
初始化需要400ms。然后,调整大小需要2或3毫秒,没关系。
但是... meanStdDev需要476ms!
如果我运行两个连续的meanStdDev,第二个更快(3ms)。
Initialization takes 400ms. Then, resizing takes 2 or 3 ms, that's OK. But... meanStdDev takes 476ms !!! If I run 2 successive meanStdDev, the second one is much faster (3ms).
我真的不明白为什么初始化对调整大小()但不是meanStdDev()...
I really don't understand why the initialization has an effect on resize() but not on meanStdDev()...
我使用-DCUDA_ARCH_BIN = 3.0编译OPENCV。我尝试用-DCUDA_ARCH_PTX =但问题仍然是一样的。
I compile OPENCV with -DCUDA_ARCH_BIN=3.0. I try with -DCUDA_ARCH_PTX="" but the problem is still the same.
感谢您的帮助。
Pierre。
#include <opencv2/opencv.hpp>
#include <opencv2/cudaimgproc.hpp>
#include "opencv2/cudawarping.hpp"
#include "opencv2/cudaarithm.hpp"
using namespace std;
int main(int argc, char *argv[])
{
double t_init_cuda = (double)cv::getTickCount();
int CudaDevice;
if(cv::cuda::getCudaEnabledDeviceCount()==0)
{
cerr<<endl<<"ERROR: NO CudaEnabledDevice"<<endl;
exit(2);
}
else
{
CudaDevice = cv::cuda::getDevice();
cv::cuda::setDevice(CudaDevice);
}
t_init_cuda = ((double)cv::getTickCount() - t_init_cuda)/cv::getTickFrequency() * 1000;
cout<<endl<<"\t*T_INIT_CUDA="<<t_init_cuda<<"ms\n";;
cv::Mat src = cv::imread(argv[1], 0);
if (!src.data) exit(1);
cv::cuda::GpuMat d_src(src);
//CV::CUDA::RESIZE
cv::cuda::GpuMat d_dst;
double factor = 0.5;
double t_gpu_resize = cv::getTickCount();
cv::cuda::resize(d_src, d_dst, cv::Size( (int) ((float) (d_src.cols)*factor) , (int) ((float) (d_src.rows)*factor)), 0, 0, CV_INTER_AREA);
t_gpu_resize = ((double)cv::getTickCount() - t_gpu_resize)/cv::getTickFrequency() * 1000;
cout<<endl<<"D_SRC="<<d_src.rows<<"x"<<d_src.cols<<" => D_DST="<<d_dst.rows<<"x"<<d_dst.cols<<endl;
cout<<endl<<"\t*T_GPU_RESIZE="<<t_gpu_resize<<"ms\n";;
//CV::CUDA::MEANSTDDEV
double t_meanstddev = (double)cv::getTickCount();
cv::Scalar mean, stddev;
std::vector<cv::cuda::GpuMat> d_src_split;
cv::cuda::split(d_src, d_src_split);
cv::cuda::meanStdDev (d_src_split[0], mean, stddev);
t_meanstddev = ((double)cv::getTickCount() - t_meanstddev)/cv::getTickFrequency() * 1000.0;
cout<<endl<<"mean="<<mean.val[0]<<" | stddev="<<stddev.val[0]<<endl;
cout<<endl<<"\t*T_GPU_MEANSTDDEV="<<t_meanstddev<<"ms\n";
return 0;
}
推荐答案
我的朋友,相同的函数两次:
My friend, When you call same function twice :
1-首次在设备上分配新内存以调整大小。 根据WIKI的 OpenCV
1- First time you allocate new memory at Device for resized. "According to WIKI of OpenCV"
2-第二次重新使用分配的内存,所以速度会很快。
2- Second time you reuse allocated memory so it will be fast.
我从OpenCV中获得了这个函数,为什么它说。
I get that function from OpenCV for you so you can understand why it said that.
void cv::cuda::meanStdDev(InputArray _src, OutputArray _dst, Stream& stream)
{
if (!deviceSupports(FEATURE_SET_COMPUTE_13))
CV_Error(cv::Error::StsNotImplemented, "Not sufficient compute capebility");
const GpuMat src = getInputMat(_src, stream);
CV_Assert( src.type() == CV_8UC1 );
GpuMat dst = getOutputMat(_dst, 1, 2, CV_64FC1, stream);
NppiSize sz;
sz.width = src.cols;
sz.height = src.rows;
int bufSize;
#if (CUDA_VERSION <= 4020)
nppSafeCall( nppiMeanStdDev8uC1RGetBufferHostSize(sz, &bufSize) );
#else
nppSafeCall( nppiMeanStdDevGetBufferHostSize_8u_C1R(sz, &bufSize) );
#endif
BufferPool pool(stream);
GpuMat buf = pool.getBuffer(1, bufSize, CV_8UC1); // <--- this line create new GpuMat
NppStreamHandler h(StreamAccessor::getStream(stream));
nppSafeCall( nppiMean_StdDev_8u_C1R(src.ptr<Npp8u>(), static_cast<int>(src.step), sz, buf.ptr<Npp8u>(), dst.ptr<Npp64f>(), dst.ptr<Npp64f>() + 1) );
syncOutput(dst, _dst, stream);
}
此功能
GpuMat cv::cuda::BufferPool::getBuffer(int rows, int cols, int type)
{
GpuMat buf(allocator_);
buf.create(rows, cols, type);
return buf;
}
我希望这会帮助你。
这篇关于OPENCV:CUDA上下文初始化为不同的方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!