Lucene的IndexWriter类慢添加文档 [英] Lucene IndexWriter slow to add documents

查看:112
本文介绍了Lucene的IndexWriter类慢添加文档的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写了一个小的循环,从而增加10,000个文档进的IndexWriter,并花了永远做到这一点。



有另一种方式来索引大量的文档?



我问,因为当此去住它有15000条记录加载。



另一个问题是如何做我防止不必在所有记录再次加载在重新启动Web应用程序时?



修改



下面是我使用的代码;

 的for(int T = 0; T< 10000;吨++ ){
DOC =新的文件();
文本=值+ t.toString();
doc.Add(新球场(值,文字,Field.Store.YES,Field.Index.TOKENIZED));
iwriter.AddDocument(DOC);
};



编辑2



 分析仪=新StandardAnalyzer(); 
Directory目录=新RAMDirectory();

的IndexWriter iwriter =新的IndexWriter(目录,分析仪,真正的);

iwriter.SetMaxFieldLength(25000);



然后代码添加的文件,然后;

  iwriter.Close(); 


解决方案

只是检查,但你没有调试器当你运行它让你连接?



这添加文档时严重影响性能。



在我机(Lucene的2.0.0.4):



建有平台目标x86的:




  • 没有调试器 - 8.4秒。


  • 调试器附着 - 113.8秒




建有平台,目标64:




  • 没有调试器 - 9.6秒。


  • 调试器附着 - 171.4秒




节约粗糙的例子,加载索引并从RAMDirectory:

  const int的DocumentCount = 10 * 1000; 
常量字符串IndexFilePath = @X:\Temp\tmp.idx

分析仪=新StandardAnalyzer();
目录ramDirectory =新RAMDirectory();

的IndexWriter IndexWriter类=新的IndexWriter(ramDirectory,分析仪,真正的);

的for(int i = 0; I< DocumentCount;我++)
{
文档的文档=新的文件();
字符串文本=值+我;
doc.Add(新球场(值,文字,Field.Store.YES,Field.Index.TOKENIZED));
indexWriter.AddDocument(DOC);
}

indexWriter.Close();

//保存指数
FSDirectory文件目录= FSDirectory.GetDirectory(IndexFilePath,真);
的IndexWriter fileIndexWriter =新的IndexWriter(文件目录,分析仪,真正的);
fileIndexWriter.AddIndexes(新[] {ramDirectory});
fileIndexWriter.Close();

//负荷指数
FSDirectory newFileDirectory = FSDirectory.GetDirectory(IndexFilePath,FALSE);
目录newRamDirectory =新RAMDirectory();
的IndexWriter newIndexWriter =新的IndexWriter(newRamDirectory,分析仪,真正的);
newIndexWriter.AddIndexes(新[] {newFileDirectory});

Console.WriteLine(新指数Writer文档计数:{0},newIndexWriter.DocCount());


I wrote a small loop which added 10,000 documents into the IndexWriter and it took for ever to do it.

Is there another way to index large volumes of documents?

I ask because when this goes live it has to load in 15,000 records.

The other question is how do I prevent having to load in all the records again when the web application is restarted?

Edit

Here is the code i used;

for (int t = 0; t < 10000; t++){
    doc = new Document();
    text = "Value" + t.toString();
    doc.Add(new Field("Value", text, Field.Store.YES, Field.Index.TOKENIZED));
    iwriter.AddDocument(doc);
};

Edit 2

        Analyzer analyzer = new StandardAnalyzer();
        Directory directory = new RAMDirectory();

        IndexWriter iwriter = new IndexWriter(directory, analyzer, true);

        iwriter.SetMaxFieldLength(25000);

then the code to add the documents, then;

        iwriter.Close();

解决方案

Just checking, but you haven't got the debugger attached when you're running it have you?

This severely affects performance when adding documents.

On my machine (Lucene 2.0.0.4):

Built with platform target x86:

  • No debugger - 5.2 seconds

  • Debugger attached - 113.8 seconds

Built with platform target x64:

  • No debugger - 6.0 seconds

  • Debugger attached - 171.4 seconds

Rough example of saving and loading an index to and from a RAMDirectory:

const int DocumentCount = 10 * 1000;
const string IndexFilePath = @"X:\Temp\tmp.idx";

Analyzer analyzer = new StandardAnalyzer();
Directory ramDirectory = new RAMDirectory();

IndexWriter indexWriter = new IndexWriter(ramDirectory, analyzer, true);

for (int i = 0; i < DocumentCount; i++)
{
    Document doc = new Document();
    string text = "Value" + i;
    doc.Add(new Field("Value", text, Field.Store.YES, Field.Index.TOKENIZED));
    indexWriter.AddDocument(doc);
}

indexWriter.Close();

//Save index
FSDirectory fileDirectory = FSDirectory.GetDirectory(IndexFilePath, true);
IndexWriter fileIndexWriter = new IndexWriter(fileDirectory, analyzer, true);
fileIndexWriter.AddIndexes(new[] { ramDirectory });
fileIndexWriter.Close();

//Load index
FSDirectory newFileDirectory = FSDirectory.GetDirectory(IndexFilePath, false);
Directory newRamDirectory = new RAMDirectory();
IndexWriter newIndexWriter = new IndexWriter(newRamDirectory, analyzer, true);
newIndexWriter.AddIndexes(new[] { newFileDirectory });

Console.WriteLine("New index writer document count:{0}.", newIndexWriter.DocCount());

这篇关于Lucene的IndexWriter类慢添加文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆