我应该为整个索引打开Lucene IndexWriter还是在添加每个文档后关闭Lucene IndexWriter? [英] Should I keep Lucene IndexWriter open for entire indexing or close after each document addition?

查看:507
本文介绍了我应该为整个索引打开Lucene IndexWriter还是在添加每个文档后关闭Lucene IndexWriter?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

每次添加文档后关闭Lucene IndexWriter会减慢我的索引编制过程吗?

Is closing Lucene IndexWriter after each document addition slow down my indexing process?

我想,关闭和打开索引编写器会减慢我的索引编制过程,或者对于Lucene而言不是真的吗?

I imagine, closing and opening index writer will slow down my indexing process or is it not true for Lucene?

基本上,我在Spring Batch作业中有一个Lucene Indexer步骤,并且正在ItemProcessor中创建索引.索引器步骤是一个分区步骤,我在创建ItemProcessor时创建了IndexWriter,并将其保持打开状态直到步骤完成.

Basically, I have a Lucene Indexer Step in a Spring Batch Job and I am creating indices in ItemProcessor. Indexer Step is a partitioned step and I create IndexWriter when ItemProcessor is created and keep it open till step completion.

@Bean
    @StepScope
    public ItemProcessor<InputVO,OutputVO> luceneIndexProcessor(@Value("#{stepExecutionContext[field1]}") String str) throws Exception{
        boolean exists = IndexUtils.checkIndexDir(str);
        String indexDir = IndexUtils.createAndGetIndexPath(str, exists);
        IndexWriterUtils indexWriterUtils = new IndexWriterUtils(indexDir, exists);
        IndexWriter indexWriter = indexWriterUtils.createIndexWriter();
        return new LuceneIndexProcessor(indexWriter);
    }

有没有办法在完成步骤后关闭此IndexWriter?

Is there a way to close this IndexWriter after step completion?

此外,我遇到了问题,因为在此步骤中我也进行搜索以查找重复的文档,但是我通过在打开阅读器并进行搜索之前添加了writer.commit();来解决了该问题.

Also, I was encountering issues because I do search also in this step to find duplicate documents but I fixed that by adding writer.commit(); before opening reader and searching.

请建议在添加每个文档后是否需要关闭并打开,还是可以一直保持打开状态?以及如何在StepExecutionListenerSupportafterStep中关闭?

Please suggest if I need to close and open after each document addition or can keep it open all along? and also how to close in StepExecutionListenerSupport's afterStep?

最初,我正在关闭并重新打开每个文档,但是索引编制过程非常缓慢,因此我认为这可能是原因.

Initially, I was closing and reopening for each document but indexing process was very slow so I thought it might be the reason.

推荐答案

由于在开发中,索引目录的大小很小,因此我们可能看不到太大的收获,但是对于较大的索引目录大小,我们不需要执行不必要的创建和关闭操作IndexWriterIndexReader.

Since in development, index directory is of small size so we may not see much gain but for large index directory sizes, we need not to do unnecessary creation and closing for IndexWriter as well as IndexReader.

在Spring Batch中,我通过以下步骤完成了

In Spring Batch, I accomplished it with these steps

1.如我的其他文章中所述问题,首先我们需要解决序列化问题,以将对象放入ExecutionContext.

1.As pointed in my other question, first we need to address problem of serialization to put object in ExecutionContext.

2.我们在分区器的ExecutionContext中创建并放置复合可序列化对象的实例.

2.We create and put instance of composite serializable object in ExecutionContext in partitioner.

3.将值从ExecutionContext传递到配置中的步进读取器,处理器或写入器,

3.Pass value from ExecutionContext to your step reader, processor or writer in configuration,

    @Bean
    @StepScope
    public ItemProcessor<InputVO,OutputVO> luceneIndexProcessor(@Value("#{stepExecutionContext[field1]}") String field1,@Value("#{stepExecutionContext[luceneObjects]}") SerializableLuceneObjects luceneObjects) throws Exception{
        LuceneIndexProcessor indexProcessor =new LuceneIndexProcessor(luceneObjects);
        return indexProcessor;
    }

4.在需要的任何地方使用此实例传递给处理器,并使用getter方法获取索引读取器或写入器,public IndexWriter getLuceneIndexWriter() {return luceneIndexWriter;}

4.Use this instance passed in processor wherever you need and use getter method to get index reader or writer,public IndexWriter getLuceneIndexWriter() {return luceneIndexWriter;}

5.最后,在StepExecutionListenerSupportafterStep(StepExecution stepExecution)中,通过从ExecutionContext获取来关闭此编写器或读取器.

5.Finally in StepExecutionListenerSupport 's afterStep(StepExecution stepExecution) close this writer or reader by getting it from ExecutionContext.

ExecutionContext executionContext = stepExecution.getExecutionContext();
SerializableLuceneObjects slObjects = (SerializableLuceneObjects)executionContext.get("luceneObjects");
IndexWriter luceneIndexWriter = slObjects.getLuceneIndexWriter();
IndexReader luceneIndexReader = slObjects.getLuceneIndexReader();
if(luceneIndexWriter !=null ) luceneIndexWriter.close();
if(luceneIndexReader != null) luceneIndexReader.close();

这篇关于我应该为整个索引打开Lucene IndexWriter还是在添加每个文档后关闭Lucene IndexWriter?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆