如何获取 Lucene 模糊搜索结果的匹配项? [英] How to get Lucene Fuzzy Search result 's matching terms?
问题描述
在使用 Lucene Fuzzy Search 时如何获得匹配的模糊词及其偏移量?
how do you get the matching fuzzy term and its offset when using Lucene Fuzzy Search?
IndexSearcher mem = ....(some standard code)
QueryParser parser = new QueryParser(Version.LUCENE_30, CONTENT_FIELD, analyzer);
TopDocs topDocs = mem.search(parser.parse("wuzzy~"), 1);
// the ~ triggers the fuzzy search as per "Lucene In Action"
模糊搜索工作正常.如果文档包含术语fuzzy"或luzzy",则匹配.如何获得匹配的术语以及它们的偏移量是多少?
The fuzzy search works fine. If a document contains the term "fuzzy" or "luzzy", it is matched. How do I get which term matched and what are their offsets?
我已确保所有 CONTENT_FIELD 都添加了带有位置和偏移量的 termVectorStored.
I have made sure that all CONTENT_FIELDs are added with termVectorStored with positions and offsets .
推荐答案
没有直接的方法可以做到这一点,但是我重新考虑了 Jared 的建议并且能够使解决方案发挥作用.
There was no straight forward way of doing this, however I reconsidered Jared's suggestion and was able to get the solution working.
我在这里记录一下,以防其他人遇到同样的问题.
I am documenting this here just in case someone else has the same issue.
创建一个实现org.apache.lucene.search.highlight.Formatter的类
public class HitPositionCollector implements Formatter
{
// MatchOffset is a simple DTO
private List<MatchOffset> matchList;
public HitPositionCollector(
{
matchList = new ArrayList<MatchOffset>();
}
// this ie where the term start and end offset as well as the actual term is captured
@Override
public String highlightTerm(String originalText, TokenGroup tokenGroup)
{
if (tokenGroup.getTotalScore() <= 0)
{
}
else
{
MatchOffset mo= new MatchOffset(tokenGroup.getToken(0).toString(), tokenGroup.getStartOffset(),tokenGroup.getEndOffset());
getMatchList().add(mo);
}
return originalText;
}
/**
* @return the matchList
*/
public List<MatchOffset> getMatchList()
{
return matchList;
}
}
主代码
public void testHitsWithHitPositionCollector() throws Exception
{
System.out.println(" .... testHitsWithHitPositionCollector");
String fuzzyStr = "bro*";
QueryParser parser = new QueryParser(Version.LUCENE_30, "f", analyzer);
Query fzyQry = parser.parse(fuzzyStr);
TopDocs hits = searcher.search(fzyQry, 10);
QueryScorer scorer = new QueryScorer(fzyQry, "f");
HitPositionCollector myFormatter= new HitPositionCollector();
//Highlighter(Formatter formatter, Scorer fragmentScorer)
Highlighter highlighter = new Highlighter(myFormatter,scorer);
highlighter.setTextFragmenter(
new SimpleSpanFragmenter(scorer)
);
Analyzer analyzer2 = new SimpleAnalyzer();
int loopIndex=0;
//for (ScoreDoc sd : hits.scoreDocs) {
Document doc = searcher.doc( hits.scoreDocs[0].doc);
String title = doc.get("f");
TokenStream stream = TokenSources.getAnyTokenStream(searcher.getIndexReader(),
hits.scoreDocs[0].doc,
"f",
doc,
analyzer2);
String fragment = highlighter.getBestFragment(stream, title);
System.out.println(fragment);
assertEquals("the quick brown fox jumps over the lazy dog", fragment);
MatchOffset mo= myFormatter.getMatchList().get(loopIndex++);
assertTrue(mo.getEndPos()==15);
assertTrue(mo.getStartPos()==10);
assertTrue(mo.getToken().equals("brown"));
}
这篇关于如何获取 Lucene 模糊搜索结果的匹配项?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!