OpenCV ORB描述符-如何精确地存储在一组字节中? [英] OpenCV ORB descriptor - how exactly is it stored in a set of bytes?

查看:213
本文介绍了OpenCV ORB描述符-如何精确地存储在一组字节中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在使用OpenCV的ORB功能提取器,但我确实注意到了(至少对我而言)奇怪的ORB描述符存储方式(它基本上是一个Brief-32,但修改与我的问题无关) .众所周知,ORB会使用修改后的FAST-9(圆半径= 9像素;还存储关键点的方向)提取关键点,并使用带有修改后的Brief-32描述符的关键点来存储关键点表示的特征. /p>

BRIEF(ORB版本)的工作方式如下:我们采用31x31像素的补丁(代表一个功能),并创建一堆随机的5x5像素测试点.然后,我们对这些点进行配对,并基于对中第一个点的强度大于或小于第二个点的强度来评估其强度,从而得出二元决策(0或1).然后我们将所有这些位取整并使用基本的求和公式来构建长度为n的二进制字符串(对于Brief-32,我们有32个字节* 8 = 256位长的二进制字符串):

SUM(2 (i-1) * bit_pair_test)

其中,bit_pair_test是我们从一对测试点的测试中计算出的位值.最终结果类似于(对于一组二进制测试(...,0,1,0,1,1)):

(2 0 * 1)+(2 1 * 1)+(2 2 * 0)+(2 3 * 1)+(2 4 * 0)+ ...

现在,OpenCV的ORB存储这些位串的方式对我来说是一个谜.如果我们看一下包含整个图像描述符的矩阵,而每一行都是单个关键点的单个描述符,我们可以看到每个描述符都有32个8位数字,这总共产生了Brief-32使用的256位存储信息. 我不明白为什么我们将这256位分成32个字节.官方文档( http://docs.opencv.org/trunk /doc/py_tutorials/py_feature2d/py_brief/py_brief.html )仅表示OpenCV以字节为单位存储此类描述符,但并未解释为什么这样做.我已经考虑了三种可能性,但不排除这些可能性的某种组合可能是答案:

  • 一些我看不见的存储技术
  • 计算二进制字符串的汉明距离很长(在我们的例子中为256位)会导致一些性能问题
  • 优化匹配过程-匹配基本上将一个图像中关键点的描述符与第二个图像中关键点的描述符进行比较.因为我们有二进制字符串,因此汉明距离是显而易见的选择.可能是以某种方式将这32个子字符串中的每一个与第二个图像中另一个关键点(位置0的子字符串(关键点X,图像1)的描述符,位置0的子字符串(关键点Y,图2).最后可能是OpenCV说:好吧,我们有80%的描述符匹配率,因为两个描述符中所有子字符串的大约26个都相同),所以我们有一个获胜者."但是我找不到任何证据来证实这一点.

PS:您可以在ORB上阅读该论文此处以及关于Brief的论文此处.

解决方案

选择8位和32位模式是由于存储和效率问题.

  • 匹配过程

BRIEF,ORB和BRISK中的位顺序不相关(与FREAK不同). 因此,这些描述符的所有位都具有同等的重要性,您不能只比较位流的第一部分,等等.

另一方面,FREAK在设计时就考虑了这样的匹配过程(在FREAK的论文中称为 cascade ).

  • 存储问题

好吧,计算机不会存储单个的.因此,您将看不到有人在位数组中存储Brief等内容.

可以从内存中读取的最小组件是一个字节(通常对应于8位,尽管某些DSP无法读取小于16位的块,但这是另一回事了). 因此,您可以看到人们将其描述符存储在 byte 数组中(C/C ++中的类型unsigned char,这是基本的OpenCV实现语言).

此外,当变量在CPU字边界上对齐时,通常可以更好(更快)地访问内存. 如今,大多数CPU都具有32或64位的字,32位字是更好的选择,因为64位架构是在设计时考虑到传统的32位处理器的.

  • 效率问题

汉明距离是通过XOR运算来计算的. 碰巧许多处理器都有专用指令集,这些指令集可以用32位字(64位CPU变得更常见之前的整数大小)有效地计算XOR. 甚至,它们还可能支持并行计算多个32位字 上的多个XOR值,这是一种称为SIMD(单输入多数据)的并行技术.例如,可以利用SSE扩展来进一步加速大小为32位的倍数的Brief/ORB/...描述符的汉明距离计算.

I'm currently using OpenCV's ORB features extractor and I did notice the strange (at least for me) way the ORB-descriptor is stored (it is basically a BRIEF-32 with a modification that is not relevant to my question). As some of you know ORB takes the keypoints extracted using a modified FAST-9 (circle radius = 9 pixels; also stores orientation of the keypoint) and uses those with a modified BRIEF-32 descriptor to store the feature that the keypoint represents.

BRIEF (ORB version) works as follows: we take a 31x31 pixels patch (represents a feature) and create a bunch of random 5x5 pixels test points. Then we take pairs of those points and evaluate their intensities resulting in a binary decision (0 or 1) based on whether the intensity of the first point in the pair is greater or less equal than the intensity of the second one. Then we take all those bits and use a basic sum-formula to build a binary string of length n (for BRIEF-32 we have 32 bytes * 8 = 256 bit long binary string):

SUM(2(i-1)*bit_pair_test)

where the bit_pair_test is the bit value that we've calculated from the test for a pair of test points. The final result is something like (for a set of binary tests (...,0,1,0,1,1)):

(20*1) + (21*1) + (22*0) + (23*1) + (24*0) + ...

Now the way OpenCV's ORB stores those bitstrings is what is a mystery to me. If we look at the matrix that contains the descriptors for a whole image with each row being a single descriptor for a single keypoint we can see that each descriptor has 32 8-bit numbers, which altogether results in those 256 bits that BRIEF-32 uses to store the information. I don't understand why we split those 256 bits in 32 bytes. The official documentation (http://docs.opencv.org/trunk/doc/py_tutorials/py_feature2d/py_brief/py_brief.html) only says that OpenCV stores such descriptors in bytes, however it does not explain why it does that. I've considered three possibilities without excluding the possibility that some combination of those might be the answer:

  • Some storage technique that I just can't see
  • Some performance issue with computing Hamming distance for binary string that long (256 bits in our case)
  • Optimization on the matching process - the matching basically compares the descriptor of a keypoint from one image to the descriptor of a keypoint in a second image. Because we have binary strings Hamming distance is the obvious choice here. It might be that somehow each of those 32 sub-strings is compared to their counterparts in the descriptor of the other keypoint in the second image (sub-string at position 0 (keypoint X, image 1) with sub-string at position 0 (keypoint Y, image 2). Finally it might be that OpenCV says: "Okay, we have 80% match rate of the descriptor since approx. 26 of all sub-string are the same in both descriptors) so we have a winner." However I was unable to find any evidence to confirm that.

PS: You can read the paper on ORB here and the paper on BRIEF here.

解决方案

The choice of 8 and 32 bits pattern is due to storage and efficiency issues.

  • Matching process

The bit order in BRIEF, ORB and BRISK is not relevant (unlike FREAK). Thus, all bits of these descriptors are of equal significance, and you can't just compare the first part of the bitstream, etc.

On the other had, FREAK was designed with such a matching process (called a cascade in FREAK's paper) in mind.

  • Storage issues

Well, computers don't store individual bits. Thus, you won't see anybody storing BRIEF and the like in bit arrays.

The smallest component that can be read from memory is a byte (corresponding usually to 8 bits, although some DSPs cannot read chunks that are smaller than 16 bits, but that's another story). Thus, you could see people storing their descriptors in byte arrays (type unsigned char in C/C++, which is the underlying OpenCV implementation language).

Plus, memory accesses are usually better (faster) when the variables are aligned on CPU word boundaries. Most CPU's nowadays have words of 32 or 64 bits, 32 bits words being the better choice because 64 bits architectures were designed with the legacy 32-bits processors in mind.

  • Efficiency issues

Hamming distance is computed via an XOR operation. It happens that many processors have dedicated instruction sets that can compute XOR efficiently with 32 bits words (the common size of an integer before 64 bits CPUs became more common). Even more, they may also support computing several XOR values on multiple 32 bits words in parallel, which is a parallelism technique called SIMD (Single Input Multiple Data). For example, SSE extensions can be leveraged to further accelerate the Hamming distance computation of BRIEF/ORB/... descriptors whose size is a multiple of 32 bits.

这篇关于OpenCV ORB描述符-如何精确地存储在一组字节中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆