在第一次搜索迭代后,自动建议无法在Lucene中工作 [英] Auto Suggestion not working in Lucene after first search iteration

查看:131
本文介绍了在第一次搜索迭代后,自动建议无法在Lucene中工作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

目前我正在使用我的应用程序中使用lucene的auto建议部分。这些词的自动提示在控制台应用程序中工作正常,但现在我已经整合到了Web应用程序中,但它不能按预期方式工作。



当文档搜索第一次有一些关键字搜索和自动建议都工作正常,并显示结果。但是,当我再次搜索一些其他关键字或相同的关键字既自动建议以及搜索结果不显示。我无法弄清楚为什么会出现这种奇怪的结果。



自动建议以及搜索的片段如下所示:

  final int HITS_PER_PAGE = 20; 

final String RICH_DOCUMENT_PATH =F:\\Sample\\SampleRichDocuments;
final String INDEX_DIRECTORY =F:\\Sample\\LuceneIndexer;

字符串searchText = request.getParameter(search_text);

BooleanQuery.Builder booleanQuery = null;
Query textQuery = null;
Query fileNameQuery = null;

try {
textQuery = new QueryParser(content,new StandardAnalyzer())。parse(searchText);
fileNameQuery = new QueryParser(title,new StandardAnalyzer())。parse(searchText);
booleanQuery = new BooleanQuery.Builder();
booleanQuery.add(textQuery,BooleanClause.Occur.SHOULD);
booleanQuery.add(fileNameQuery,BooleanClause.Occur.SHOULD);
} catch(ParseException e){
e.printStackTrace();
}


目录索引= FSDirectory.open(新文件(INDEX_DIRECTORY).toPath());
IndexReader reader = DirectoryReader.open(index);

IndexSearcher搜索者=新的IndexSearcher(读者);
TopScoreDocCollector收藏者= TopScoreDocCollector.create(HITS_PER_PAGE);

尝试{
searcher.search(booleanQuery.build(),collector);
ScoreDoc [] hits = collector.topDocs()。scoreDocs; (ScoreDoc hit:hits){
Document doc = reader.document(hit.doc);




//自动建议数据

字典dictionary =新的LuceneDictionary(reader,content);
analyzeInfixSuggester analyzeSuggester = new AnalyzingInfixSuggester(index,new StandardAnalyzer());
analyzeSuggester.build(dictionary);

List< LookupResult> lookupResultList = analyzeSuggester.lookup(searchText,false,10);
System.out.println(Look up result size ::+ lookupResultList.size()); (LookupResult lookupResult:lookupResultList)
{
System.out.println(lookupResult.key +---+ lookupResult.value);
}

analyzeSuggester.close();
reader.close();

} catch(IOException e){
e.printStackTrace();





例如:
在第一次迭代中,如果我搜索单词样本




  • 自动提示给我结果:样本,样本,取样器等(这些是文档中的单词)

  • 搜索结果为:sample



但是,如果我再次搜索同一文本或不同的文本,并且LookUpResult列表大小也将变为零。



我不明白为什么会发生这种情况。请帮助

以下是从一组文档中创建索引的更新代码。

  final String INDEX_DIRECTORY =F:\\Sample\\LuceneIndexer; 
long startTime = System.currentTimeMillis();
列表< ContentHandler> contentHandlerList = new ArrayList< ContentHandler> ();

String fileNames =(String)request.getAttribute(message);

File file = new File(F:\\Sample\\SampleRichDocuments+ fileNames);

ArrayList< File> fileList = new ArrayList< File>();
fileList.add(file);

元数据元数据=新元数据();

//用Apache Tikka解析Rich文档集
ContentHandler handler = new BodyContentHandler(-1);
ParseContext context = new ParseContext();
解析器解析器= new AutoDetectParser();
InputStream stream = new FileInputStream(file);

尝试{
parser.parse(stream,handler,metadata,context);
contentHandlerList.add(handler);
} catch(TikaException e){
e.printStackTrace();
} catch(SAXException e){
e.printStackTrace();
} catch(IOException e){
e.printStackTrace();
}
finally {
try {
stream.close();
} catch(IOException e){
e.printStackTrace();
}
}

FieldType fieldType = new FieldType();
fieldType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
fieldType.setStoreTermVectors(true);
fieldType.setStoreTermVectorPositions(true);
fieldType.setStoreTermVectorPayloads(true);
fieldType.setStoreTermVectorOffsets(true);
fieldType.setStored(true);

分析仪分析仪=新的StandardAnalyzer();
目录目录= FSDirectory.open(新文件(INDEX_DIRECTORY).toPath());
IndexWriterConfig conf = new IndexWriterConfig(analyzer);
IndexWriter writer = new IndexWriter(directory,conf);

Iterator< ContentHandler> handlerIterator = contentHandlerList.iterator();
Iterator< File> fileIterator = fileList.iterator();

日期日期=新日期();

while(handlerIterator.hasNext()&& fileIterator.hasNext()){
Document doc = new Document();

String text = handlerIterator.next()。toString();
String textFileName = fileIterator.next()。getName();

String fileName = textFileName.replaceAll(_,);
fileName = fileName.replaceAll( - ,);
fileName = fileName.replaceAll(\\。,);

字符串fileNameArr [] = fileName.split(\\s +);
for(String contentTitle:fileNameArr){
Field titleField = new Field(title,contentTitle,fieldType);
titleField.setBoost(2.0f);
doc.add(titleField);
}

if(fileNameArr.length> 0){
fileName = fileNameArr [0];
}

String document_id = UUID.randomUUID()。toString();

FieldType documentFieldType = new FieldType();
documentFieldType.setStored(false);

字段idField =新字段(document_id,document_id,documentFieldType);
Field fileNameField = new Field(file_name,textFileName,fieldType);
字段contentField =新字段(content,text,fieldType);

doc.add(idField);
doc.add(contentField);
doc.add(fileNameField);

writer.addDocument(doc);

analyzer.close();
}

writer.commit();
writer.deleteUnusedFiles();
long endTime = System.currentTimeMillis();

writer.close();

另外我观察到从第二次搜索迭代中索引目录中的文件被删除,只有带有.segment后缀的文件正在变得像.segmenta,.segmentb,.segmentc等一样。



我不知道为什么会出现这种奇怪的情况。

解决方案

我认为你的问题是使用writer.deleteUnusedFiles()调用。



,this调用可以删除未引用的索引提交。



要删除的索引是由 IndexDeletionPolicy
但是默认的删除政策是 KeepOnlyLastCommitDeletionPolicy ,一旦完成一个新的提交(这与2.2之前的行为匹配),它总是会删除旧的提交。。


它还提到了最后关闭时删除,这意味着一旦这个索引被使用并关闭(例如在搜索期间),该索引将被删除。



因此,与您的第一个搜索结果匹配的所有索引都会立即删除。



试试这个:

  IndexWriterConfig conf = new IndexWriterConfig(analyzer); 
conf.setIndexDeletionPolicy(NoDeletionPolicy.INSTANCE);


Currently I am working on the auto suggestion part using lucene in my application. The Auto suggestion of the words are working fine in console application but now i have integerated to the web application but it's not working the desired way.

When the documents are search for the first time with some keywords search and auto suggestion both are working fine and showing the result. But when i search again for some other keyword or same keyword both the auto suggestion as well as Search result are not showing. I am not able to figure out why this weird result is coming.

The snippets for the auto suggestion as well as search are as follows:

final int HITS_PER_PAGE = 20;

final String RICH_DOCUMENT_PATH = "F:\\Sample\\SampleRichDocuments";
final String INDEX_DIRECTORY = "F:\\Sample\\LuceneIndexer";

String searchText = request.getParameter("search_text");

BooleanQuery.Builder booleanQuery = null;
Query textQuery = null;
Query fileNameQuery = null;

try {
    textQuery = new QueryParser("content", new StandardAnalyzer()).parse(searchText);
    fileNameQuery = new QueryParser("title", new StandardAnalyzer()).parse(searchText);
    booleanQuery = new BooleanQuery.Builder();
    booleanQuery.add(textQuery, BooleanClause.Occur.SHOULD);
    booleanQuery.add(fileNameQuery, BooleanClause.Occur.SHOULD);
} catch (ParseException e) {
    e.printStackTrace();
}


Directory index = FSDirectory.open(new File(INDEX_DIRECTORY).toPath());
IndexReader reader = DirectoryReader.open(index);

IndexSearcher searcher = new IndexSearcher(reader);
TopScoreDocCollector collector = TopScoreDocCollector.create(HITS_PER_PAGE);

try{
    searcher.search(booleanQuery.build(), collector);
        ScoreDoc[] hits = collector.topDocs().scoreDocs;

    for (ScoreDoc hit : hits) {
        Document doc = reader.document(hit.doc);
    }

    // Auto Suggestion of the data

    Dictionary dictionary = new LuceneDictionary(reader, "content");
    AnalyzingInfixSuggester analyzingSuggester = new AnalyzingInfixSuggester(index, new StandardAnalyzer());
    analyzingSuggester.build(dictionary);

    List<LookupResult> lookupResultList = analyzingSuggester.lookup(searchText, false, 10);
    System.out.println("Look up result size :: "+lookupResultList.size());
    for (LookupResult lookupResult : lookupResultList) {
         System.out.println(lookupResult.key+" --- "+lookupResult.value);
    }

    analyzingSuggester.close();
    reader.close();

}catch(IOException e){
    e.printStackTrace();
}

For ex: In first iteration if i search for word "sample"

  • Auto suggestion gives me result: sample, samples, sampler etc. (These are the words in the documents)
  • Search Result as : sample

But if i search it again with same text or different it's showing no result and also LookUpResult list size is coming Zero.

I am not getting why this is happening. Please help

Below is the updated code for the index creation from set of documents.

final String INDEX_DIRECTORY = "F:\\Sample\\LuceneIndexer";
long startTime = System.currentTimeMillis();
List<ContentHandler> contentHandlerList = new ArrayList<ContentHandler>    ();

String fileNames = (String)request.getAttribute("message");

File file = new File("F:\\Sample\\SampleRichDocuments"+fileNames);

ArrayList<File> fileList = new ArrayList<File>();
fileList.add(file);

Metadata metadata = new Metadata();

// Parsing the Rich document set with Apache Tikka
ContentHandler handler = new BodyContentHandler(-1);
ParseContext context = new ParseContext();
Parser parser = new AutoDetectParser();
InputStream stream = new FileInputStream(file);

try {
    parser.parse(stream, handler, metadata, context);
    contentHandlerList.add(handler);
}catch (TikaException e) {
    e.printStackTrace();
}catch (SAXException e) {
    e.printStackTrace();
} catch (IOException e) {
    e.printStackTrace();
}
finally {
    try {
        stream.close();
    } catch (IOException e) {
        e.printStackTrace();
    }
}

FieldType fieldType = new FieldType();
fieldType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
fieldType.setStoreTermVectors(true);
fieldType.setStoreTermVectorPositions(true);
fieldType.setStoreTermVectorPayloads(true);
fieldType.setStoreTermVectorOffsets(true);
fieldType.setStored(true);

Analyzer analyzer = new StandardAnalyzer();
Directory directory = FSDirectory.open(new      File(INDEX_DIRECTORY).toPath());
IndexWriterConfig conf = new IndexWriterConfig(analyzer);
IndexWriter writer = new IndexWriter(directory, conf);

Iterator<ContentHandler> handlerIterator = contentHandlerList.iterator();
Iterator<File> fileIterator = fileList.iterator();

Date date = new Date();

while (handlerIterator.hasNext() && fileIterator.hasNext()) {
Document doc = new Document();

String text = handlerIterator.next().toString();
String textFileName = fileIterator.next().getName();

String fileName = textFileName.replaceAll("_", " ");
fileName = fileName.replaceAll("-", " ");
fileName = fileName.replaceAll("\\.", " ");

String fileNameArr[] = fileName.split("\\s+");
for(String contentTitle : fileNameArr){
    Field titleField = new Field("title",contentTitle,fieldType);
    titleField.setBoost(2.0f);
    doc.add(titleField);
}

if(fileNameArr.length > 0){
    fileName = fileNameArr[0];
}

String document_id= UUID.randomUUID().toString();

FieldType documentFieldType = new FieldType();
documentFieldType.setStored(false);

Field idField = new Field("document_id",document_id, documentFieldType);
Field fileNameField = new Field("file_name", textFileName, fieldType);
Field contentField = new Field("content",text,fieldType);

doc.add(idField);
doc.add(contentField);
doc.add(fileNameField);

writer.addDocument(doc);

analyzer.close();
}

writer.commit();
writer.deleteUnusedFiles();
long endTime = System.currentTimeMillis();

writer.close();

Also i have observed that from second search iteration the files in the index directory are getting deleted and only the file with .segment suffix is getting changes like .segmenta, .segmentb, .segmentc etc..

I dont know why this weird situation is happening.

解决方案

I think your problem is with writer.deleteUnusedFiles() call.

According to JavaDocs, this call can "delete unreferenced index commits".

What indexes to delete is driven by IndexDeletionPolicy. However "The default deletion policy is KeepOnlyLastCommitDeletionPolicy, which always removes old commits as soon as a new commit is done (this matches the behavior before 2.2).".

It also talks about "delete on last close", which means once this index is used and closed(e.g. during search), that index will be deleted.

So all indexes that matched your first search result will be deleted immediately.

Try this:

IndexWriterConfig conf = new IndexWriterConfig(analyzer);
conf.setIndexDeletionPolicy(NoDeletionPolicy.INSTANCE);

这篇关于在第一次搜索迭代后,自动建议无法在Lucene中工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆