Tika in Action 书籍示例 Lucene StandardAnalyzer 不起作用 [英] Tika in Action book examples Lucene StandardAnalyzer does not work

查看:31
本文介绍了Tika in Action 书籍示例 Lucene StandardAnalyzer 不起作用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

首先,当谈到 Tika 和 Lucene 时,我完全是个菜鸟.我正在通过 Tika in Action 一书来尝试示例.在第 5 章中给出了这个例子:

First of all I am a total noob when it comes to Tika and Lucene. I am working through the Tika in Action book trying out the examples. In chapter 5 this example is given:

package tikatest01;

import java.io.File;
import org.apache.tika.Tika;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.Field.Index;
import org.apache.lucene.document.Field.Store;
import org.apache.lucene.index.IndexWriter;

public class LuceneIndexer {

    private final Tika tika;
    private final IndexWriter writer;

    public LuceneIndexer(Tika tika, IndexWriter writer) {
        this.tika = tika;
        this.writer = writer;
    }

    public void indexDocument(File file) throws Exception {
        Document document = new Document();
        document.add(new Field(
            "filename", file.getName(),
            Store.YES, Index.ANALYZED));
        document.add(new Field(
            "fulltext", tika.parseToString(file),
            Store.NO, Index.ANALYZED));
        writer.addDocument(document);
    }
}

还有这个主要方法:

package tikatest01;

import java.io.File;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.store.SimpleFSDirectory;
import org.apache.lucene.util.Version;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.tika.Tika;

public class TikaTest01 {

    public static void main(String[] args) throws Exception {

        String filename = "C:\\testdoc.pdf";
        File file = new File(filename);

        IndexWriter writer = new IndexWriter(
            new SimpleFSDirectory(file),
            new StandardAnalyzer(Version.LUCENE_30), 
            MaxFieldLength.UNLIMITED);
        try {
            LuceneIndexer indexer = new LuceneIndexer(new Tika(), writer);
            indexer.indexDocument(file);
            } 
        finally {
            writer.close();
            }
    }
}

我已将库 tika-app-1.5.jar、lucene-core-4.7.0.jar 和 lucene-analyzers-common-4.7.0.jar 添加到项目中.

I've added the libraries tika-app-1.5.jar, lucene-core-4.7.0.jar and lucene-analyzers-common-4.7.0.jar to the project.

问题:

在当前版本的 Lucene 中,不推荐使用 Field.Index,我应该使用什么来代替?

With the current version of Lucene the Field.Index is deprecated, what should I use instead?

未找到 MaxFieldLength.我缺少导入?

MaxFieldLength is not found. I am missing an import?

推荐答案

对于 Lucene 4.7 索引器的这段代码:

For Lucene 4.7 this code for the indexer:

package tikatest01;

import java.io.File;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field.Store;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.IndexWriter;
import org.apache.tika.Tika;

public class LuceneIndexer {

    private final Tika tika;
    private final IndexWriter writer;

    public LuceneIndexer(Tika tika, IndexWriter writer) {
        this.tika = tika;
        this.writer = writer;
    }

    public void indexDocument(File file) throws Exception {
        Document document = new Document();
        document.add(new TextField(
                "filename", file.getName(), Store.YES));
        document.add(new TextField(
                "fulltext", tika.parseToString(file), Store.NO));
        writer.addDocument(document);
    }
}

还有这个主类的代码:

package tikatest01;

import java.io.File;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.store.SimpleFSDirectory;
import org.apache.lucene.util.Version;
import org.apache.tika.Tika;

public class TikaTest01 {

    public static void main(String[] args) throws Exception {

        String dirname = "C:\\MyTestDir\\";
        File dir = new File(dirname);


        IndexWriter writer = new IndexWriter(
            new SimpleFSDirectory(dir), 
            new IndexWriterConfig(
                Version.LUCENE_47, 
                new StandardAnalyzer(Version.LUCENE_47)));
        try {
            LuceneIndexer indexer = new LuceneIndexer(new Tika(), writer);
            indexer.indexDocument(dir);
            } 
        finally {
            writer.close();
            }
    }
}

这篇关于Tika in Action 书籍示例 Lucene StandardAnalyzer 不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆