获取使用通配符搜索时,搜索文档中的条款相匹配 [英] Getting terms matched in a document when searching using a wildcard search
本文介绍了获取使用通配符搜索时,搜索文档中的条款相匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我要寻找一种方法来发现使用Lucene的waldcard搜索文档中匹配的条件。我用了解释器,试图找到的条款,但这个失败。 。相关代码的一部分低于
ScoreDoc [] = myHits myTopDocs.scoreDocs;
INT hitsCount = myHits.Length;
为(INT myCounter = 0; myCounter< hitsCount; myCounter ++)
{
文档的DOC = searcher.Doc(myHits [myCounter] .DOC);
解释解释= searcher.Explain(更改为MyQuery,myCounter);
串myExplanation = explanation.ToString();
...
当我做说微*搜索,文档被发现它进入循环,但myExplanation包含不匹配,没有其他信息。
我如何得到这是本文档中的术语?
任何帮助将是非常赞赏。
问候
解决方案
类TVM:TermVectorMapper
{
公开名单<串GT; FoundTerms =新的List<串GT;();
的HashSet<串GT; _termTexts =新的HashSet<串GT;();
公共TVM(查询Q,的IndexReader R):基地()
{
名单,LT;期限> allTerms =新的List<条款>();
q.Rewrite(R).ExtractTerms(allTerms);
的foreach(在allTerms期限t)的_termTexts.Add(t.Text());
}
公共覆盖无效SetExpectations(串场,诠释numTerms,布尔storeOffsets,布尔storePositions)
{
}
公众覆盖无效地图(串词,INT频率,TermVectorOffsetInfo []偏移,INT []位置)
{
如果(_termTexts.Contains(项))FoundTerms.Add(项);
}
}
无效TermVectorMapperTest()
{
RAMDirectory DIR =新RAMDirectory();
的IndexWriter作家=新的IndexWriter(DIR,新Lucene.Net.Analysis.Standard.StandardAnalyzer(),TRUE);
文档D = NULL;
D =新的文件();
d.Add(新域(文字,显微镜AAA,Field.Store.YES,Field.Index.ANALYZED,Field.TermVector.WITH_POSITIONS_OFFSETS));
writer.AddDocument(四);
D =新的文件();
d.Add(新域(文字,微软BBB,Field.Store.YES,Field.Index.ANALYZED,Field.TermVector.WITH_POSITIONS_OFFSETS));
writer.AddDocument(四);
writer.Close();
的IndexReader读卡器= IndexReader.Open(DIR);
IndexSearcher的搜索=新IndexSearcher的(阅读器);
的QueryParser QueryParser的=新的QueryParser(文字,新Lucene.Net.Analysis.Standard.StandardAnalyzer());
queryParser.SetMultiTermRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE);
查询查询= queryParser.Parse(微*);
TopDocs结果= searcher.Search(查询,5);
System.Diagnostics.Debug.Assert(results.TotalHits == 2);
TVM TVM =新TVM(查询,读卡器);
的for(int i = 0; I< results.ScoreDocs.Length;我++)
{
Console.Write(DOCID:+ results.ScoreDocs [I] .DOC + >中);
reader.GetTermFreqVector(results.ScoreDocs [I] .DOC,文本,TVM);
的foreach(在tvm.FoundTerms串词)Console.Write(项+);
tvm.FoundTerms.Clear();
Console.WriteLine();
}
}
I am looking for a way to find the terms that matched in the document using waldcard search in Lucene. I used the explainer to try and find the terms but this failed. A portion of the relevant code is below.
ScoreDoc[] myHits = myTopDocs.scoreDocs;
int hitsCount = myHits.Length;
for (int myCounter = 0; myCounter < hitsCount; myCounter++)
{
Document doc = searcher.Doc(myHits[myCounter].doc);
Explanation explanation = searcher.Explain(myQuery, myCounter);
string myExplanation = explanation.ToString();
...
When I do a search on say micro*, documents are found and it enter the loop but myExplanation contains NON-MATCH and no other information.
How do I get the term that was found in this document ?
Any help would be most appreciated.
Regards
解决方案
class TVM : TermVectorMapper
{
public List<string> FoundTerms = new List<string>();
HashSet<string> _termTexts = new HashSet<string>();
public TVM(Query q, IndexReader r) : base()
{
List<Term> allTerms = new List<Term>();
q.Rewrite(r).ExtractTerms(allTerms);
foreach (Term t in allTerms) _termTexts.Add(t.Text());
}
public override void SetExpectations(string field, int numTerms, bool storeOffsets, bool storePositions)
{
}
public override void Map(string term, int frequency, TermVectorOffsetInfo[] offsets, int[] positions)
{
if (_termTexts.Contains(term)) FoundTerms.Add(term);
}
}
void TermVectorMapperTest()
{
RAMDirectory dir = new RAMDirectory();
IndexWriter writer = new IndexWriter(dir, new Lucene.Net.Analysis.Standard.StandardAnalyzer(), true);
Document d = null;
d = new Document();
d.Add(new Field("text", "microscope aaa", Field.Store.YES, Field.Index.ANALYZED,Field.TermVector.WITH_POSITIONS_OFFSETS));
writer.AddDocument(d);
d = new Document();
d.Add(new Field("text", "microsoft bbb", Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
writer.AddDocument(d);
writer.Close();
IndexReader reader = IndexReader.Open(dir);
IndexSearcher searcher = new IndexSearcher(reader);
QueryParser queryParser = new QueryParser("text", new Lucene.Net.Analysis.Standard.StandardAnalyzer());
queryParser.SetMultiTermRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE);
Query query = queryParser.Parse("micro*");
TopDocs results = searcher.Search(query, 5);
System.Diagnostics.Debug.Assert(results.TotalHits == 2);
TVM tvm = new TVM(query, reader);
for (int i = 0; i < results.ScoreDocs.Length; i++)
{
Console.Write("DOCID:" + results.ScoreDocs[i].Doc + " > ");
reader.GetTermFreqVector(results.ScoreDocs[i].Doc, "text", tvm);
foreach (string term in tvm.FoundTerms) Console.Write(term + " ");
tvm.FoundTerms.Clear();
Console.WriteLine();
}
}
这篇关于获取使用通配符搜索时,搜索文档中的条款相匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文