从Lucene找到搜索命中的位置 [英] Finding the position of search hits from Lucene

查看:202
本文介绍了从Lucene找到搜索命中的位置的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用Lucene,在搜索结果中找到匹配项的推荐方法是什么?

With Lucene, what would be the recommended approach for locating matches in search results?

更具体地说,假设索引文档有一个字段fullText,用于存储平原 - 某些文件的文字内容。此外,假设对于这些文件中的一个,内容是快速的棕色狐狸跳过懒狗。接下来,搜索狐狸狗。显然,这份文件很受欢迎。

More specifically, suppose index documents have a field "fullText" which stores the plain-text content of some document. Furthermore, assume that for one of these documents the content is "The quick brown fox jumps over the lazy dog". Next a search is performed for "fox dog". Obviously, the document would be a hit.

在这种情况下,Lucene可以用来提供类似匹配区域的内容吗?所以对于这种情况,我想产生类似的东西:

In this scenario, can Lucene be used to provide something like the matching regions for found document? So for this scenario I would like to produce something like:

[{match: "fox", startIndex: 10, length: 3},
 {match: "dog", startIndex: 34, length: 3}]



<我怀疑它可以通过org.apache.lucene.search.highlight包中提供的内容来实现。我不确定整体方法...

I suspect that it could be implemented by what's provided in the org.apache.lucene.search.highlight package. I'm not sure about the overall approach though...

推荐答案

我使用的是TermFreqVector。这是一个工作演示,它打印术语位置,以及开始和结束术语索引:

TermFreqVector is what I used. Here is a working demo, that prints both the term positions, and the starting and ending term indexes:

public class Search {
    public static void main(String[] args) throws IOException, ParseException {
        Search s = new Search();  
        s.doSearch(args[0], args[1]);  
    }  

    Search() {
    }  

    public void doSearch(String db, String querystr) throws IOException, ParseException {
        // 1. Specify the analyzer for tokenizing text.  
        //    The same analyzer should be used as was used for indexing  
        StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);  

        Directory index = FSDirectory.open(new File(db));  

        // 2. query  
        Query q = new QueryParser(Version.LUCENE_CURRENT, "contents", analyzer).parse(querystr);  

        // 3. search  
        int hitsPerPage = 10;  
        IndexSearcher searcher = new IndexSearcher(index, true);  
        IndexReader reader = IndexReader.open(index, true);  
        searcher.setDefaultFieldSortScoring(true, false);  
        TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true);  
        searcher.search(q, collector);  
        ScoreDoc[] hits = collector.topDocs().scoreDocs;  

        // 4. display term positions, and term indexes   
        System.out.println("Found " + hits.length + " hits.");  
        for(int i=0;i<hits.length;++i) {  

            int docId = hits[i].doc;  
            TermFreqVector tfvector = reader.getTermFreqVector(docId, "contents");  
            TermPositionVector tpvector = (TermPositionVector)tfvector;  
            // this part works only if there is one term in the query string,  
            // otherwise you will have to iterate this section over the query terms.  
            int termidx = tfvector.indexOf(querystr);  
            int[] termposx = tpvector.getTermPositions(termidx);  
            TermVectorOffsetInfo[] tvoffsetinfo = tpvector.getOffsets(termidx);  

            for (int j=0;j<termposx.length;j++) {  
                System.out.println("termpos : "+termposx[j]);  
            }  
            for (int j=0;j<tvoffsetinfo.length;j++) {  
                int offsetStart = tvoffsetinfo[j].getStartOffset();  
                int offsetEnd = tvoffsetinfo[j].getEndOffset();  
                System.out.println("offsets : "+offsetStart+" "+offsetEnd);  
            }  

            // print some info about where the hit was found...  
            Document d = searcher.doc(docId);  
            System.out.println((i + 1) + ". " + d.get("path"));  
        }  

        // searcher can only be closed when there  
        // is no need to access the documents any more.   
        searcher.close();  
    }      
}

这篇关于从Lucene找到搜索命中的位置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆