Lucene的IndexWriter类慢添加文档 [英] Lucene IndexWriter slow to add documents
问题描述
我写了一个小的循环,从而增加10,000个文档进的IndexWriter,并花了永远做到这一点。
有另一种方式来索引大量的文档?
我问,因为当此去住它有15000条记录加载。
另一个问题是如何做我防止不必在所有记录再次加载在重新启动Web应用程序时?
修改
下面是我使用的代码;
的for(int T = 0; T< 10000;吨++ ){
DOC =新的文件();
文本=值+ t.toString();
doc.Add(新球场(值,文字,Field.Store.YES,Field.Index.TOKENIZED));
iwriter.AddDocument(DOC);
};
编辑2
分析仪=新StandardAnalyzer();
Directory目录=新RAMDirectory();
的IndexWriter iwriter =新的IndexWriter(目录,分析仪,真正的);
iwriter.SetMaxFieldLength(25000);
然后代码添加的文件,然后;
iwriter.Close();
只是检查,但你没有调试器当你运行它让你连接?
这添加文档时严重影响性能。
在我机(Lucene的2.0.0.4):
建有平台目标x86的:
-
没有调试器 - 8.4秒。
-
调试器附着 - 113.8秒
建有平台,目标64:
-
没有调试器 - 9.6秒。
-
调试器附着 - 171.4秒
节约粗糙的例子,加载索引并从RAMDirectory:
const int的DocumentCount = 10 * 1000;
常量字符串IndexFilePath = @X:\Temp\tmp.idx
分析仪=新StandardAnalyzer();
目录ramDirectory =新RAMDirectory();
的IndexWriter IndexWriter类=新的IndexWriter(ramDirectory,分析仪,真正的);
的for(int i = 0; I< DocumentCount;我++)
{
文档的文档=新的文件();
字符串文本=值+我;
doc.Add(新球场(值,文字,Field.Store.YES,Field.Index.TOKENIZED));
indexWriter.AddDocument(DOC);
}
indexWriter.Close();
//保存指数
FSDirectory文件目录= FSDirectory.GetDirectory(IndexFilePath,真);
的IndexWriter fileIndexWriter =新的IndexWriter(文件目录,分析仪,真正的);
fileIndexWriter.AddIndexes(新[] {ramDirectory});
fileIndexWriter.Close();
//负荷指数
FSDirectory newFileDirectory = FSDirectory.GetDirectory(IndexFilePath,FALSE);
目录newRamDirectory =新RAMDirectory();
的IndexWriter newIndexWriter =新的IndexWriter(newRamDirectory,分析仪,真正的);
newIndexWriter.AddIndexes(新[] {newFileDirectory});
Console.WriteLine(新指数Writer文档计数:{0},newIndexWriter.DocCount());
I wrote a small loop which added 10,000 documents into the IndexWriter and it took for ever to do it.
Is there another way to index large volumes of documents?
I ask because when this goes live it has to load in 15,000 records.
The other question is how do I prevent having to load in all the records again when the web application is restarted?
Edit
Here is the code i used;
for (int t = 0; t < 10000; t++){
doc = new Document();
text = "Value" + t.toString();
doc.Add(new Field("Value", text, Field.Store.YES, Field.Index.TOKENIZED));
iwriter.AddDocument(doc);
};
Edit 2
Analyzer analyzer = new StandardAnalyzer();
Directory directory = new RAMDirectory();
IndexWriter iwriter = new IndexWriter(directory, analyzer, true);
iwriter.SetMaxFieldLength(25000);
then the code to add the documents, then;
iwriter.Close();
Just checking, but you haven't got the debugger attached when you're running it have you?
This severely affects performance when adding documents.
On my machine (Lucene 2.0.0.4):
Built with platform target x86:
No debugger - 5.2 seconds
Debugger attached - 113.8 seconds
Built with platform target x64:
No debugger - 6.0 seconds
Debugger attached - 171.4 seconds
Rough example of saving and loading an index to and from a RAMDirectory:
const int DocumentCount = 10 * 1000;
const string IndexFilePath = @"X:\Temp\tmp.idx";
Analyzer analyzer = new StandardAnalyzer();
Directory ramDirectory = new RAMDirectory();
IndexWriter indexWriter = new IndexWriter(ramDirectory, analyzer, true);
for (int i = 0; i < DocumentCount; i++)
{
Document doc = new Document();
string text = "Value" + i;
doc.Add(new Field("Value", text, Field.Store.YES, Field.Index.TOKENIZED));
indexWriter.AddDocument(doc);
}
indexWriter.Close();
//Save index
FSDirectory fileDirectory = FSDirectory.GetDirectory(IndexFilePath, true);
IndexWriter fileIndexWriter = new IndexWriter(fileDirectory, analyzer, true);
fileIndexWriter.AddIndexes(new[] { ramDirectory });
fileIndexWriter.Close();
//Load index
FSDirectory newFileDirectory = FSDirectory.GetDirectory(IndexFilePath, false);
Directory newRamDirectory = new RAMDirectory();
IndexWriter newIndexWriter = new IndexWriter(newRamDirectory, analyzer, true);
newIndexWriter.AddIndexes(new[] { newFileDirectory });
Console.WriteLine("New index writer document count:{0}.", newIndexWriter.DocCount());
这篇关于Lucene的IndexWriter类慢添加文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!