比较存储在mysql数据库中的SIFT特性 [英] Comparing SIFT features stored in a mysql database

查看:275
本文介绍了比较存储在mysql数据库中的SIFT特性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在扩展一个用于图像分类的图像库,我想查找包含或包含在其他图像中的重复图像,变换图像和图像。

我测试了SIFT实现从OpenCV和它的工作非常好,但对于多个图像会相当缓慢。太快加快我想我可以提取的功能,并将其保存在数据库中,因为许多其他图像相关的元数据已被保存在那里。



将新图片的功能与数据库中的功能进行比较的最快方法是什么?

通常使用kd-树,FLANN或者使用 Pyramid Match Kernel ,我在这里的另一个线程找到了,但还没有看起来很多。



知道一种在数据库中高效地保存和搜索kd-tree的方法,我目前只看到三个选项:

*让MySQL计算数据库中每个要素的欧氏距离,确保对于多个图像,这将需要不合理的时间。

*在开始时将整个数据集加载到内存中,并构建kd-tree。这可能会很快,但是内存密集。加上所有数据都需要从数据库传输。

*将生成的树保存到数据库中并加载所有树,将是最快的方法,但也会生成大量的流量,如新的图像kd-树将不得不重建并发送到服务器。



我使用的是OpenCV的SIFT实现,但是我没有设置它。

解决方案

如果有一个特征提取器更适合这个任务p>所以我基本上做了一些非常类似于这几年前。 您要查看的算法是几年前由David Nister提出的,文章是:使用词汇树的可扩展识别。



这里是一个指向抽象的链接,你可以找到一个下载通过googleing标题链接。
http://ieeexplore.ieee.org/xpl/freeabs_all.jsp ?arnumber = 1641018



基本思想是使用分层k-means算法建立一个树以建模特征,然后利用特征的稀疏分布在那棵树上快速找到你最近的邻居...或类似的东西,这是我工作了几年。您可以在作者网页上找到PowerPoint演示文稿: http:// www。 vis.uky.edu/~dnister/Publications/publications.html



其他几个注意事项:




  • 我不会对金字塔匹配内核感到烦恼,它更像是改进对象识别而不是重复/转换图像检测。


  • 我不会在SQL数据库中存储任何此功能的东西。根据您的应用程序,有时有时更有效地计算您的功能,因为它们的大小可能超过原始图像大小当密集计算。


  • SQL数据库不是为进行大量浮点矢量计算而设计的。 您可以在数据库中存储内容,但不要将其用作计算工具。我用SQLite尝试过一次,结果非常糟糕。


  • 如果您决定实现这一点,请仔细阅读本文,并在实施时保留副本,因为有许多细节对于使算法高效运行非常重要。

    / li>

I'm currently extending an image library used to categorize images and i want to find duplicate images, transformed images, and images that contain or are contained in other images.
I have tested the SIFT implementation from OpenCV and it works very well but would be rather slow for multiple images. Too speed it up I thought I could extract the features and save them in a database as a lot of other image related meta data is already being held there.

What would be the fastest way to compare the features of a new images to the features in the database?
Usually comparison is done calculating the euclidean distance using kd-trees, FLANN, or with the Pyramid Match Kernel that I found in another thread here on SO, but haven't looked much into yet.

Since I don't know of a way to save and search a kd-tree in a database efficiently, I'm currently only seeing three options:
* Let MySQL calculate the euclidean distance to every feature in the database, although I'm sure that that will take an unreasonable time for more than a few images.
* Load the entire dataset into memory at the beginning and build the kd-tree(s). This would probably be fast, but very memory intensive. Plus all the data would need to be transferred from the database.
* Saving the generated trees into the database and loading all of them, would be the fastest method but also generate high amounts of traffic as with new images the kd-trees would have to be rebuilt and send to the server.

I'm using the SIFT implementation of OpenCV, but I'm not dead set on it. If there is a feature extractor more suitable for this task (and roughly equally robust) I'm glad if someone could suggest one.

解决方案

So I basically did something very similar to this a few years ago. The algorithm you want to look into was proposed a few years ago by David Nister, the paper is: "Scalable Recognition with a Vocabulary Tree". They pretty much have an exact solution to your problem that can scale to millions of images.

Here is a link to the abstract, you can find a download link by googleing the title. http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1641018

The basic idea is to build a tree with a hierarchical k-means algorithm to model the features and then leverage the sparse distribution of features in that tree to quickly find your nearest neighbors... or something like that, it's been a few years since I worked on it. You can find a powerpoint presentation on the authors webpage here: http://www.vis.uky.edu/~dnister/Publications/publications.html

A few other notes:

  • I wouldn't bother with the pyramid match kernel, it's really more for improving object recognition than duplicate/transformed image detection.

  • I would not store any of this feature stuff in an SQL database. Depending on your application it is sometimes more effective to compute your features on the fly since their size can exceed the original image size when computed densely. Histograms of features or pointers to nodes in a vocabulary tree are much more efficient.

  • SQL databases are not designed for doing massive floating point vector calculations. You can store things in your database, but don't use it as a tool for computation. I tried this once with SQLite and it ended very badly.

  • If you decide to implement this, read the paper in detail and keep a copy handy while implementing it, as there are many minor details that are very important to making the algorithm work efficiently.

这篇关于比较存储在mysql数据库中的SIFT特性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆