深度估计的准确性-立体视觉 [英] Accuracy in depth estimation - Stereo Vision

查看:181
本文介绍了深度估计的准确性-立体视觉的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究立体视觉,并且对这个问题中的深度估计的准确性感兴趣.它取决于几个因素,例如:

I am doing a research in stereo vision and I am interested in accuracy of depth estimation in this question. It depends of several factors like:

  • 正确的立体声校准(旋转,平移和失真提取),
  • 图像分辨率,
  • 相机和镜头的质量(失真少,色彩捕捉适当)
  • 两个图像之间的匹配功能.

比方说,我们没有低成本的相机和镜头(没有廉价的网络摄像头等).

Let's say we have a no low-cost cameras and lenses (no cheap webcams etc).

我的问题是,我们在该领域可以实现的深度估计的精度是多少? 任何人都知道真正的立体视觉系统能以某种精度工作吗? 我们可以达到1毫米的深度估算精度吗?

My question is, what is the accuracy of depth estimation we can achieve in this field? Anyone knows a real stereo vision system that works with some accuracy? Can we achieve 1 mm depth estimation accuracy?

我的问题还针对在opencv中实现的系统.您设法达到什么精度?

My question also aims in systems implemented in opencv. What accuracy did you manage to achieve?

推荐答案

我要补充一点,即使对于昂贵的相机,使用颜色也是个坏主意-仅使用灰度强度梯度即可.一些高端立体摄像机(例如Point Grey)的生产商过去常常依赖于颜色,然后转换为灰色.还应将偏差和方差视为立体声匹配误差的两个组成部分.这很重要,因为例如使用具有较大相关窗口的相关立体声将平均深度(即将世界建模为一堆平行的面片)并减少偏差,同时增加方差,反之亦然.因此,总会有一个权衡.

I would add that using color is a bad idea even with expensive cameras - just use the gradient of gray intensity. Some producers of high-end stereo cameras (for example Point Grey) used to rely on color and then switched to grey. Also consider a bias and a variance as two components of a stereo matching error. This is important since using a correlation stereo, for example, with a large correlation window would average depth (i.e. model the world as a bunch of fronto-parallel patches) and reduce the bias while increasing the variance and vice versa. So there is always a trade-off.

除了上面提到的因素以外,立体声的准确性还取决于算法的细节.由算法来验证深度(立体估计后的重要步骤)并优雅地修补无纹理区域中的孔.例如,考虑来回验证(将R与L匹配应产生与将L与R匹配相同的候选对象),斑点噪声去除(用

More than the factors you mentioned above, the accuracy of your stereo will depend on the specifics of the algorithm. It is up to an algorithm to validate depth (important step after stereo estimation) and gracefully patch the holes in textureless areas. For example, consider back-and-forth validation (matching R to L should produce the same candidates as matching L to R), blob noise removal (non Gaussian noise typical for stereo matching removed with connected component algorithm), texture validation (invalidate depth in areas with weak texture), uniqueness validation (having a uni-modal matching score without second and third strong candidates. This is typically a short cut to back-and-forth validation), etc. The accuracy will also depend on sensor noise and sensor's dynamic range.

最后,由于d = f * B/z,其中B是相机之间的基线,f是以像素为单位的焦距,z是沿光轴的距离,因此您最终必须问一个关于深度与精度的函数的问题.因此,精度对基线和距离的依赖性很大.

Finally you have to ask your question about accuracy as a function of depth since d=f*B/z, where B is a baseline between cameras, f is focal length in pixels and z is the distance along optical axis. Thus there is a strong dependence of accuracy on the baseline and distance.

Kinect将提供1mm的精度(偏差),最大偏差可达1m左右.然后它急剧下降. Kinect的死区可达50厘米,因为两个相机在近距离处没有足够的重叠.是的-Kinect是一款立体摄像机,其中一台摄像机是由IR投影仪模拟的.

Kinect will provide 1mm accuracy (bias) with quite large variance up to 1m or so. Then it sharply goes down. Kinect would have a dead zone up to 50cm since there is no sufficient overlap of two cameras at a close distance. And yes - Kinect is a stereo camera where one of the cameras is simulated by an IR projector.

我确信使用概率立体声,例如马尔可夫随机场上的置信传播",可以实现更高的精度.但是那些方法假定了关于物体表面的光滑度或特定的表面取向的一些先验知识.请参阅第14页.

I am sure with probabilistic stereo such as Belief Propagation on Markov Random Fields one can achieve a higher accuracy. But those methods assume some strong priors about smoothness of object surfaces or particular surface orientation. See this for example, page 14.

这篇关于深度估计的准确性-立体视觉的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆