无法使用lucene删除文档IndexWriter.deleteDocuments(term) [英] can't delete document with lucene IndexWriter.deleteDocuments(term)

查看:123
本文介绍了无法使用lucene删除文档IndexWriter.deleteDocuments(term)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

现在这两天一直在努力,只是无法用 indexWriter.deleteDocuments(term)删除文件

Have been struggling for this two days now, just can't delete the document with indexWriter.deleteDocuments(term)

在这里,我将放置将要进行测试的代码,希望有人可以指出我做错了什么,已经尝试过的事情:

Here I will put the code which will do a test, hopefully someone could point out what I have done wrong, things that have been tried:


  1. 将lucene版本从 2.x 更新为 5.x

  2. 使用 indexWriter.deleteDocuments()而不是 indexReader.deleteDocuments()

  3. indexOption 配置为 NONE DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS

  1. Updating the lucene version from 2.x to 5.x
  2. Using indexWriter.deleteDocuments() instead of indexReader.deleteDocuments()
  3. Tring the indexOption configured as NONE or DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS

此处代码:

import org.apache.lucene.analysis.core.SimpleAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.FieldType;
import org.apache.lucene.index.*;
import org.apache.lucene.queryparser.classic.ParseException;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;

import java.io.IOException;
import java.nio.file.Paths;

public class TestSearch {
    static SimpleAnalyzer analyzer = new SimpleAnalyzer();

    public static void main(String[] argvs) throws IOException, ParseException {
        generateIndex("5836962b0293a47b09d345f1");
        query("5836962b0293a47b09d345f1");
        delete("5836962b0293a47b09d345f1");
        query("5836962b0293a47b09d345f1");

    }

    public static void generateIndex(String id) throws IOException {
        Directory directory = FSDirectory.open(Paths.get("/tmp/test/lucene"));
        IndexWriterConfig config = new IndexWriterConfig(analyzer);
        IndexWriter iwriter = new IndexWriter(directory, config);
        FieldType fieldType = new FieldType();
        fieldType.setStored(true);
        fieldType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
        Field idField = new Field("_id", id, fieldType);
        Document doc = new Document();
        doc.add(idField);
        iwriter.addDocument(doc);
        iwriter.close();

    }

    public static void query(String id) throws ParseException, IOException {
        Query query = new QueryParser("_id", analyzer).parse(id);
        Directory directory = FSDirectory.open(Paths.get("/tmp/test/lucene"));
        IndexReader ireader  = DirectoryReader.open(directory);
        IndexSearcher isearcher = new IndexSearcher(ireader);
        ScoreDoc[] scoreDoc = isearcher.search(query, 100).scoreDocs;
        for(ScoreDoc scdoc: scoreDoc){
            Document doc = isearcher.doc(scdoc.doc);
            System.out.println(doc.get("_id"));
        }
    }

    public static void delete(String id){
        try {
             Directory directory = FSDirectory.open(Paths.get("/tmp/test/lucene"));
            IndexWriterConfig config = new IndexWriterConfig(analyzer);
            IndexWriter iwriter = new IndexWriter(directory, config);
            Term term = new Term("_id", id);
            iwriter.deleteDocuments(term);
            iwriter.commit();
            iwriter.close();
        }catch (IOException e){
            e.printStackTrace();
        }
    }
}

首先 generateIndex()将在 / tmp / test / lucene 中生成索引, query()将显示 id 将成功查询,然后 delete()希望删除该文档,但是 query()再次证明删除操作失败。

First generateIndex() will generate a index in /tmp/test/lucene, and query() will show that id will be successfully queried, then delete() was hopefully to deleting the document, but query() again will prove that the deleting action failed.

这是pom依赖项,以防有人需要测试

Here is the pom dependency in case someone may need for a test

    <dependency>
        <groupId>org.apache.lucene</groupId>
        <artifactId>lucene-core</artifactId>
        <version>5.5.4</version>
        <type>jar</type>
    </dependency>
    <dependency>
        <groupId>org.apache.lucene</groupId>
        <artifactId>lucene-analyzers-common</artifactId>
        <version>5.5.4</version>
    </dependency>
    <dependency>
        <groupId>org.apache.lucene</groupId>
        <artifactId>lucene-queryparser</artifactId>
        <version>5.5.4</version>
    </dependency>
    <dependency>
        <groupId>org.apache.lucene</groupId>
        <artifactId>lucene-analyzers-smartcn</artifactId>
        <version>5.5.4</version>
    </dependency>

绝望的答案。

推荐答案

您的问题出在分析仪上。 SimpleAnalyzer 将标记定义为字母的最大字​​符串 StandardAnalyzer ,甚至 WhitespaceAnalyzer ,是更典型的选择),因此您索引的值会被分割为标记:b,a,b,d,f。您定义的删除方法虽然不通过分析器,但只是创建一个原始术语。如果您尝试用此替换 main ,则可以看到此操作:

Your problem is in the analyzer. SimpleAnalyzer defines tokens as maximal strings of letters (StandardAnalyzer, or even WhitespaceAnalyzer, are more typical choices), so the value you are indexing gets split into the tokens: "b", "a", "b", "d", "f". The delete method you've defined doesn't pass through the analyzer though, but rather just creates a raw term. You can see this in action if you try replacing your main with this:

generateIndex("5836962b0293a47b09d345f1");
query("5836962b0293a47b09d345f1");
delete("b");
query("5836962b0293a47b09d345f1");

作为一般规则,查询和条款等分析, QueryParser可以。

As a general rule, queries and terms and such do not analyze, QueryParser does.

对于(看起来像什么)标识符字段,您可能根本不想分析这个字段。在这种情况下,将其添加到FieldType:

For (what looks like) an identifier field, you probably don't really want to analyze this field at all. In that case, add this to the FieldType:

fieldType.setTokenized(false);

然后您必须更改查询(再次,QueryParser分析),并使用 TermQuery 而不是。

You will then have to change your query (again, QueryParser analyzes), and use TermQuery instead.

Query query = new TermQuery(new Term("_id", id));

这篇关于无法使用lucene删除文档IndexWriter.deleteDocuments(term)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆