从lucene中按术语删除文档 [英] Deleting document by Term from lucene
问题描述
以下代码未按预期的条款删除文档:
The following code does not delete the document by Term as expected:
RAMDirectory idx = new RAMDirectory();
IndexWriter writer = new IndexWriter(idx,
new SnowballAnalyzer(Version.LUCENE_30, "English"),
IndexWriter.MaxFieldLength.LIMITED);
Document doc = new Document();
doc.add(new Field("title", "mydoc", Field.Store.YES, Field.Index.NO));
doc.add(new Field("content", "some content, deleteme", Field.Store.YES, Field.Inde
x.ANALYZED));
writer.addDocument(doc);
Document doc2 = new Document();
doc2.add(new Field("title", "mydoc2", Field.Store.YES, Field.Index.NO));
doc2.add(new Field("content", "other content, don't deleteme", Field.Store.YES, Field.I
ndex.ANALYZED));
writer.addDocument(doc2);
writer.optimize();
writer.close();
/*
IndexReader reader = IndexReader.open(idx, false);
int docs_up_for_deletion = reader.docFreq(new Term("title"));
int before = reader.numDocs();
int docs_deleted = reader.deleteDocuments(new Term("title", "mydoc"));
reader.close();
*/
IndexWriter writer2 = new IndexWriter(idx,
new SnowballAnalyzer(Version.LUCENE_30, "English"),
IndexWriter.MaxFieldLength.LIMITED);
int before = writer2.numDocs();
writer2.deleteDocuments(new Term("title", "mydoc"));
writer2.commit();
writer2.optimize();
int after = writer2.numDocs();
writer2.close();
int docs_deleted = before - after;
我尝试使用IndexReader和IndexWriter删除,但都无法正常工作.
I've tried deleting with the IndexReader and IndexWriter and neither works.
我还尝试在上述代码之后添加另一个IndexReader搜索,以防万一该数字仅在关闭writer2后才更新(在
I've also tried adding another IndexReader search after the above code just in case the number only gets updated after closing writer2 (mentioned in this FAQ), but that doesn't help. Doing a writer.deleteAll() works, just not the delete by Term.
我发现了一个古老的参考事实,即只能删除Field.Keyword类型的字段,但这在Lucene 3.x中不再是有效的字段类型
I found an old reference to the fact that only fields of type Field.Keyword can be deleted, but this is no longer a valid field type in Lucene 3.x
推荐答案
您的标题字段未建立索引.更改
Your title field is not indexed. Change
new Field("title", "mydoc", Field.Store.YES, Field.Index.NO)
到
new Field("title", "mydoc", Field.Store.YES, Field.Index.ANALYZED)
或
new Field("title", "mydoc", Field.Store.YES, Field.Index.NOT_ANALYZED)
取决于您是否要对字段进行分析.
depending on whether or not you want your field analyzed.
这篇关于从lucene中按术语删除文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!