无法使用 lucene IndexWriter.deleteDocuments(term) 删除文档 [英] can't delete document with lucene IndexWriter.deleteDocuments(term)

查看:24
本文介绍了无法使用 lucene IndexWriter.deleteDocuments(term) 删除文档的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这两天一直在苦苦挣扎,就是无法用indexWriter.deleteDocuments(term)

Have been struggling for this two days now, just can't delete the document with indexWriter.deleteDocuments(term)

这里我会放上做测试的代码,希望有人能指出我做错了什么,已经尝试过的事情:

Here I will put the code which will do a test, hopefully someone could point out what I have done wrong, things that have been tried:

  1. 将 lucene 版本从 2.x 更新为 5.x
  2. 使用 indexWriter.deleteDocuments() 代替 indexReader.deleteDocuments()
  3. indexOption 配置为 NONEDOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS
  1. Updating the lucene version from 2.x to 5.x
  2. Using indexWriter.deleteDocuments() instead of indexReader.deleteDocuments()
  3. Tring the indexOption configured as NONE or DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS

这里是代码:

import org.apache.lucene.analysis.core.SimpleAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.FieldType;
import org.apache.lucene.index.*;
import org.apache.lucene.queryparser.classic.ParseException;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;

import java.io.IOException;
import java.nio.file.Paths;

public class TestSearch {
    static SimpleAnalyzer analyzer = new SimpleAnalyzer();

    public static void main(String[] argvs) throws IOException, ParseException {
        generateIndex("5836962b0293a47b09d345f1");
        query("5836962b0293a47b09d345f1");
        delete("5836962b0293a47b09d345f1");
        query("5836962b0293a47b09d345f1");

    }

    public static void generateIndex(String id) throws IOException {
        Directory directory = FSDirectory.open(Paths.get("/tmp/test/lucene"));
        IndexWriterConfig config = new IndexWriterConfig(analyzer);
        IndexWriter iwriter = new IndexWriter(directory, config);
        FieldType fieldType = new FieldType();
        fieldType.setStored(true);
        fieldType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
        Field idField = new Field("_id", id, fieldType);
        Document doc = new Document();
        doc.add(idField);
        iwriter.addDocument(doc);
        iwriter.close();

    }

    public static void query(String id) throws ParseException, IOException {
        Query query = new QueryParser("_id", analyzer).parse(id);
        Directory directory = FSDirectory.open(Paths.get("/tmp/test/lucene"));
        IndexReader ireader  = DirectoryReader.open(directory);
        IndexSearcher isearcher = new IndexSearcher(ireader);
        ScoreDoc[] scoreDoc = isearcher.search(query, 100).scoreDocs;
        for(ScoreDoc scdoc: scoreDoc){
            Document doc = isearcher.doc(scdoc.doc);
            System.out.println(doc.get("_id"));
        }
    }

    public static void delete(String id){
        try {
             Directory directory = FSDirectory.open(Paths.get("/tmp/test/lucene"));
            IndexWriterConfig config = new IndexWriterConfig(analyzer);
            IndexWriter iwriter = new IndexWriter(directory, config);
            Term term = new Term("_id", id);
            iwriter.deleteDocuments(term);
            iwriter.commit();
            iwriter.close();
        }catch (IOException e){
            e.printStackTrace();
        }
    }
}

首先generateIndex()会在/tmp/test/lucene中生成索引,query()会显示id 会被查询成功,那么 delete() 是希望删除该文档,但再次 query() 将证明删除操作失败.

First generateIndex() will generate a index in /tmp/test/lucene, and query() will show that id will be successfully queried, then delete() was hopefully to deleting the document, but query() again will prove that the deleting action failed.

这是 pom 依赖项,以防有人可能需要测试

Here is the pom dependency in case someone may need for a test

    <dependency>
        <groupId>org.apache.lucene</groupId>
        <artifactId>lucene-core</artifactId>
        <version>5.5.4</version>
        <type>jar</type>
    </dependency>
    <dependency>
        <groupId>org.apache.lucene</groupId>
        <artifactId>lucene-analyzers-common</artifactId>
        <version>5.5.4</version>
    </dependency>
    <dependency>
        <groupId>org.apache.lucene</groupId>
        <artifactId>lucene-queryparser</artifactId>
        <version>5.5.4</version>
    </dependency>
    <dependency>
        <groupId>org.apache.lucene</groupId>
        <artifactId>lucene-analyzers-smartcn</artifactId>
        <version>5.5.4</version>
    </dependency>

渴望得到答案.

推荐答案

你的问题出在分析器上.SimpleAnalyzer 将标记定义为 letters 的最大字符串(StandardAnalyzer,甚至 WhitespaceAnalyzer,是更典型的选择),所以您要索引的值被拆分为标记:b"、a"、b"、d"、f".您定义的删除方法虽然不会通过分析器,但只是创建一个原始术语.如果您尝试将 main 替换为以下内容,则可以看到这一点:

Your problem is in the analyzer. SimpleAnalyzer defines tokens as maximal strings of letters (StandardAnalyzer, or even WhitespaceAnalyzer, are more typical choices), so the value you are indexing gets split into the tokens: "b", "a", "b", "d", "f". The delete method you've defined doesn't pass through the analyzer though, but rather just creates a raw term. You can see this in action if you try replacing your main with this:

generateIndex("5836962b0293a47b09d345f1");
query("5836962b0293a47b09d345f1");
delete("b");
query("5836962b0293a47b09d345f1");

作为一般规则,查询和术语等分析,QueryParser 会.

As a general rule, queries and terms and such do not analyze, QueryParser does.

对于(看起来像)一个标识符字段,您可能根本不想分析这个字段.在这种情况下,将其添加到 FieldType:

For (what looks like) an identifier field, you probably don't really want to analyze this field at all. In that case, add this to the FieldType:

fieldType.setTokenized(false);

然后您将不得不更改您的查询(同样,QueryParser 分析),并改用 TermQuery.

You will then have to change your query (again, QueryParser analyzes), and use TermQuery instead.

Query query = new TermQuery(new Term("_id", id));

这篇关于无法使用 lucene IndexWriter.deleteDocuments(term) 删除文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆