无法使用lucene删除文档IndexWriter.deleteDocuments(term) [英] can't delete document with lucene IndexWriter.deleteDocuments(term)
问题描述
现在这两天一直在努力,只是无法用 indexWriter.deleteDocuments(term)删除文件
Have been struggling for this two days now, just can't delete the document with indexWriter.deleteDocuments(term)
在这里,我将放置将要进行测试的代码,希望有人可以指出我做错了什么,已经尝试过的事情:
Here I will put the code which will do a test, hopefully someone could point out what I have done wrong, things that have been tried:
- 将lucene版本从
2.x
更新为5.x
- 使用
indexWriter.deleteDocuments()
而不是indexReader.deleteDocuments()
- 将
indexOption
配置为NONE
或DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS
- Updating the lucene version from
2.x
to5.x
- Using
indexWriter.deleteDocuments()
instead ofindexReader.deleteDocuments()
- Tring the
indexOption
configured asNONE
orDOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS
此处代码:
import org.apache.lucene.analysis.core.SimpleAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.FieldType;
import org.apache.lucene.index.*;
import org.apache.lucene.queryparser.classic.ParseException;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import java.io.IOException;
import java.nio.file.Paths;
public class TestSearch {
static SimpleAnalyzer analyzer = new SimpleAnalyzer();
public static void main(String[] argvs) throws IOException, ParseException {
generateIndex("5836962b0293a47b09d345f1");
query("5836962b0293a47b09d345f1");
delete("5836962b0293a47b09d345f1");
query("5836962b0293a47b09d345f1");
}
public static void generateIndex(String id) throws IOException {
Directory directory = FSDirectory.open(Paths.get("/tmp/test/lucene"));
IndexWriterConfig config = new IndexWriterConfig(analyzer);
IndexWriter iwriter = new IndexWriter(directory, config);
FieldType fieldType = new FieldType();
fieldType.setStored(true);
fieldType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
Field idField = new Field("_id", id, fieldType);
Document doc = new Document();
doc.add(idField);
iwriter.addDocument(doc);
iwriter.close();
}
public static void query(String id) throws ParseException, IOException {
Query query = new QueryParser("_id", analyzer).parse(id);
Directory directory = FSDirectory.open(Paths.get("/tmp/test/lucene"));
IndexReader ireader = DirectoryReader.open(directory);
IndexSearcher isearcher = new IndexSearcher(ireader);
ScoreDoc[] scoreDoc = isearcher.search(query, 100).scoreDocs;
for(ScoreDoc scdoc: scoreDoc){
Document doc = isearcher.doc(scdoc.doc);
System.out.println(doc.get("_id"));
}
}
public static void delete(String id){
try {
Directory directory = FSDirectory.open(Paths.get("/tmp/test/lucene"));
IndexWriterConfig config = new IndexWriterConfig(analyzer);
IndexWriter iwriter = new IndexWriter(directory, config);
Term term = new Term("_id", id);
iwriter.deleteDocuments(term);
iwriter.commit();
iwriter.close();
}catch (IOException e){
e.printStackTrace();
}
}
}
首先 generateIndex()
将在 / tmp / test / lucene
中生成索引, query()
将显示 id
将成功查询,然后 delete()
希望删除该文档,但是 query()
再次证明删除操作失败。
First generateIndex()
will generate a index in /tmp/test/lucene
, and query()
will show that id
will be successfully queried, then delete()
was hopefully to deleting the document, but query()
again will prove that the deleting action failed.
这是pom依赖项,以防有人需要测试
Here is the pom dependency in case someone may need for a test
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-core</artifactId>
<version>5.5.4</version>
<type>jar</type>
</dependency>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-analyzers-common</artifactId>
<version>5.5.4</version>
</dependency>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-queryparser</artifactId>
<version>5.5.4</version>
</dependency>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-analyzers-smartcn</artifactId>
<version>5.5.4</version>
</dependency>
绝望的答案。
推荐答案
您的问题出在分析仪上。 SimpleAnalyzer
将标记定义为字母的最大字符串( StandardAnalyzer
,甚至 WhitespaceAnalyzer
,是更典型的选择),因此您索引的值会被分割为标记:b,a,b,d,f。您定义的删除方法虽然不通过分析器,但只是创建一个原始术语。如果您尝试用此替换 main
,则可以看到此操作:
Your problem is in the analyzer. SimpleAnalyzer
defines tokens as maximal strings of letters (StandardAnalyzer
, or even WhitespaceAnalyzer
, are more typical choices), so the value you are indexing gets split into the tokens: "b", "a", "b", "d", "f". The delete method you've defined doesn't pass through the analyzer though, but rather just creates a raw term. You can see this in action if you try replacing your main
with this:
generateIndex("5836962b0293a47b09d345f1");
query("5836962b0293a47b09d345f1");
delete("b");
query("5836962b0293a47b09d345f1");
作为一般规则,查询和条款等不分析, QueryParser可以。
As a general rule, queries and terms and such do not analyze, QueryParser does.
对于(看起来像什么)标识符字段,您可能根本不想分析这个字段。在这种情况下,将其添加到FieldType:
For (what looks like) an identifier field, you probably don't really want to analyze this field at all. In that case, add this to the FieldType:
fieldType.setTokenized(false);
然后您必须更改查询(再次,QueryParser分析),并使用 TermQuery
而不是。
You will then have to change your query (again, QueryParser analyzes), and use TermQuery
instead.
Query query = new TermQuery(new Term("_id", id));
这篇关于无法使用lucene删除文档IndexWriter.deleteDocuments(term)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!