使用整数列表文档进行全文搜索的最佳方法 [英] Best approach for doing full-text search with list-of-integers documents

查看:88
本文介绍了使用整数列表文档进行全文搜索的最佳方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用一个基于相似性的C ++ / Qt图像检索系统,工作方式如下(我会尽量避免不相关或偏离主题的细节):

I'm working on a C++/Qt image retrieval system based on similarity that works as follows (I'll try to avoid irrelevant or off-topic details):

我采取一个图像的集合,并使用OpenCV函数从它们构建索引。之后,对于每个图像,我得到一个表示每个图像属于的重要类的整数值的列表。两个图像具有更多的整数,它们被认为是更相似的。
所以,当我想查询系统时,我只需要计算表示查询图像的整数列表,执行全文搜索(或类似)并检索X最相似的图像。

I take a collection of images and build an index from them using OpenCV functions. After that, for each image, I get a list of integer values representing important "classes" that each image belongs to. The more integers two images have in common, the more similar they are believed to be. So, when I want to query the system, I just have to compute the list of integers representing the query image, perform a full-text search (or similar) and retrieve the X most similar images.

我的问题是,什么是permorm这样的搜索的最好的方法是什么?
我听说过Lucene,Lemur和其他索引方法,但是我不知道这种全文搜索是否是最好的方法,给定域减少(只有整数而不是单词)。
我想知道在效率,准确性或C ++友好方面的替代方案。

My question is, what's the best approach to permorm such a search? I've heard about Lucene, Lemur and other indexing methods, but I don't know if this kind of full-text searchs are the best way, given the domain is reduced (only integers instead of words). I'd like to know about the alternatives in terms of efficiency, accuracy or C++ friendliness.

谢谢!

推荐答案

您可以在这里查看Lucene的图像检索(LIRE): http://www.semanticmetadata.net/2006/05/19/lire-lucene-image-retrieval-04-released/

You can take a look at Lucene for image retrieval (LIRE) here: http://www.semanticmetadata.net/2006/05/19/lire-lucene-image-retrieval-04-released/

如果我错了,你正试图实现一个典型的单词图像检索我是正确的吗?如果是这样,你可能试图建立一个反向文件索引。 Lucene本身不合适,因为你可能已经实现了它的索引文本而不是数字。使用它的类来查询索引也将是一个问题,因为它不是设计成解析(即检测关键点,提取描述符,然后矢量量化它们)图像到查询向量中。

If I'm mistaken, you are trying to implement a typical bag of words image retrieval am I correct? If so you are probably trying to build an inverted file index. Lucene on its own is not suitable as you probably have already realized as it index text instead of numbers. Using its classes for querying the index would also be a problem as it is not designed to "parse" (i.e. detect keypoints, extract descriptors then vector-quantize them) image into the query vector.

LIRE已被修改为索引特征向量。但是,它似乎没有工作开箱的袋子的单词模型。另外,我想我已经在作者的网站上读到,它目前使用强力匹配,而不是反向文件索引检索图像,但我希望它比Lucene本身更容易扩展为您的目的。

LIRE on the other hand have been modified to index feature vectors. However, it does not appear to work out of the box for bag of words model. Also, I think I've read on the author's website that it currently uses brute force matching rather than the inverted file index to retrieve the images but I would expect it to be easier to extend than Lucene itself for your purposes.

希望这有助于。

这篇关于使用整数列表文档进行全文搜索的最佳方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆