获取使用通配符搜索时,搜索文档中的条款相匹配 [英] Getting terms matched in a document when searching using a wildcard search

查看:125
本文介绍了获取使用通配符搜索时,搜索文档中的条款相匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要寻找一种方法来发现使用Lucene的waldcard搜索文档中匹配的条件。我用了解释器,试图找到的条款,但这个失败。 。相关代码的一部分低于

  ScoreDoc [] = myHits myTopDocs.scoreDocs; 
INT hitsCount = myHits.Length;
为(INT myCounter = 0; myCounter< hitsCount; myCounter ++)
{
文档的DOC = searcher.Doc(myHits [myCounter] .DOC);
解释解释= searcher.Explain(更改为MyQuery,myCounter);
串myExplanation = explanation.ToString();
...

当我做说微*搜索,文档被发现它进入循环,但myExplanation包含不匹配,没有其他信息。



我如何得到这是本文档中的术语?



任何帮助将是非常赞赏。



问候


解决方案

 类TVM:TermVectorMapper 
{
公开名单<串GT; FoundTerms =新的List<串GT;();
的HashSet<串GT; _termTexts =新的HashSet<串GT;();

公共TVM(查询Q,的IndexReader R):基地()
{
名单,LT;期限> allTerms =新的List<条款>();
q.Rewrite(R).ExtractTerms(allTerms);
的foreach(在allTerms期限t)的_termTexts.Add(t.Text());
}

公共覆盖无效SetExpectations(串场,诠释numTerms,布尔storeOffsets,布尔storePositions)
{
}

公众覆盖无效地图(串词,INT频率,TermVectorOffsetInfo []偏移,INT []位置)
{
如果(_termTexts.Contains(项))FoundTerms.Add(项);
}
}

无效TermVectorMapperTest()
{
RAMDirectory DIR =新RAMDirectory();
的IndexWriter作家=新的IndexWriter(DIR,新Lucene.Net.Analysis.Standard.StandardAnalyzer(),TRUE);
文档D = NULL;

D =新的文件();
d.Add(新域(文字,显微镜AAA,Field.Store.YES,Field.Index.ANALYZED,Field.TermVector.WITH_POSITIONS_OFFSETS));
writer.AddDocument(四);

D =新的文件();
d.Add(新域(文字,微软BBB,Field.Store.YES,Field.Index.ANALYZED,Field.TermVector.WITH_POSITIONS_OFFSETS));
writer.AddDocument(四);

writer.Close();

的IndexReader读卡器= IndexReader.Open(DIR);
IndexSearcher的搜索=新IndexSearcher的(阅读器);

的QueryParser QueryParser的=新的QueryParser(文字,新Lucene.Net.Analysis.Standard.StandardAnalyzer());
queryParser.SetMultiTermRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE);
查询查询= queryParser.Parse(微*);

TopDocs结果= searcher.Search(查询,5);
System.Diagnostics.Debug.Assert(results.TotalHits == 2);

TVM TVM =新TVM(查询,读卡器);
的for(int i = 0; I< results.ScoreDocs.Length;我++)
{
Console.Write(DOCID:+ results.ScoreDocs [I] .DOC + >中);
reader.GetTermFreqVector(results.ScoreDocs [I] .DOC,文本,TVM);
的foreach(在tvm.FoundTerms串词)Console.Write(项+);
tvm.FoundTerms.Clear();
Console.WriteLine();
}
}


I am looking for a way to find the terms that matched in the document using waldcard search in Lucene. I used the explainer to try and find the terms but this failed. A portion of the relevant code is below.

ScoreDoc[] myHits = myTopDocs.scoreDocs;
int hitsCount = myHits.Length;
for (int myCounter = 0; myCounter < hitsCount; myCounter++)
{
    Document doc = searcher.Doc(myHits[myCounter].doc);
    Explanation explanation = searcher.Explain(myQuery, myCounter);
    string myExplanation = explanation.ToString();
    ...

When I do a search on say micro*, documents are found and it enter the loop but myExplanation contains NON-MATCH and no other information.

How do I get the term that was found in this document ?

Any help would be most appreciated.

Regards

解决方案

    class TVM : TermVectorMapper
    {
        public List<string> FoundTerms = new List<string>();
        HashSet<string> _termTexts = new HashSet<string>();

        public TVM(Query q, IndexReader r) : base()
        {
            List<Term> allTerms = new List<Term>();
            q.Rewrite(r).ExtractTerms(allTerms);
            foreach (Term t in allTerms) _termTexts.Add(t.Text());
        }

        public override void SetExpectations(string field, int numTerms, bool storeOffsets, bool storePositions)
        {
        }

        public override void Map(string term, int frequency, TermVectorOffsetInfo[] offsets, int[] positions)
        {
            if (_termTexts.Contains(term)) FoundTerms.Add(term);
        }
    }

    void TermVectorMapperTest()
    {
        RAMDirectory dir = new RAMDirectory();
        IndexWriter writer = new IndexWriter(dir, new Lucene.Net.Analysis.Standard.StandardAnalyzer(), true);
        Document d = null;

        d = new Document();
        d.Add(new Field("text", "microscope aaa", Field.Store.YES, Field.Index.ANALYZED,Field.TermVector.WITH_POSITIONS_OFFSETS));
        writer.AddDocument(d);

        d = new Document();
        d.Add(new Field("text", "microsoft bbb", Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
        writer.AddDocument(d);

        writer.Close();

        IndexReader reader = IndexReader.Open(dir);
        IndexSearcher searcher = new IndexSearcher(reader);

        QueryParser queryParser = new QueryParser("text", new Lucene.Net.Analysis.Standard.StandardAnalyzer());
        queryParser.SetMultiTermRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE); 
        Query query = queryParser.Parse("micro*");

        TopDocs results = searcher.Search(query, 5);
        System.Diagnostics.Debug.Assert(results.TotalHits == 2);

        TVM tvm = new TVM(query, reader);
        for (int i = 0; i < results.ScoreDocs.Length; i++)
        {
            Console.Write("DOCID:" + results.ScoreDocs[i].Doc + " > ");
            reader.GetTermFreqVector(results.ScoreDocs[i].Doc, "text", tvm);
            foreach (string term in tvm.FoundTerms) Console.Write(term + " ");
            tvm.FoundTerms.Clear();
            Console.WriteLine();
        }
    }

这篇关于获取使用通配符搜索时,搜索文档中的条款相匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆