在不建立索引的情况下使用Lucene Analyzer-我的方法是否合理? [英] Using Lucene Analyzer Without Indexing - Is My Approach Reasonable?

查看：30 发布时间：2021/5/30 21:42:38 java lucene

本文介绍了在不建立索引的情况下使用Lucene Analyzer-我的方法是否合理?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的目标是利用Lucene的许多分词器和过滤器来转换输入文本，但不创建任何索引.

My objective is to leverage some of Lucene's many tokenizers and filters to transform input text, but without the creation of any indexes.

例如，给定此(人为)输入字符串...

For example, given this (contrived) input string...

某人-[texté]在这里，foo."

...以及像这样的 Lucene 分析器...

...and a Lucene analyzer like this...

Analyzer analyzer = CustomAnalyzer.builder()
        .withTokenizer("icu")
        .addTokenFilter("lowercase")
        .addTokenFilter("icuFolding")
        .build();

我想得到以下输出:

某人的短信在这里 foo

下面的Java方法可以满足我的要求.

The below Java method does what I want.

但是我应该有一种更好(即更典型和/或更简洁)的方式吗?

我正在特别考虑使用 TokenStream 和 CharTermAttribute 的方式，因为我以前从未像这样使用过它们.感觉笨拙.

I am specifically thinking about the way I have used TokenStream and CharTermAttribute, since I have never used them like this before. Feels clunky.

这是代码:

Lucene 8.3.0进口:

Lucene 8.3.0 imports:

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
import org.apache.lucene.analysis.custom.CustomAnalyzer;

我的方法:

private String transform(String input) throws IOException {

    Analyzer analyzer = CustomAnalyzer.builder()
            .withTokenizer("icu")
            .addTokenFilter("lowercase")
            .addTokenFilter("icuFolding")
            .build();

    TokenStream ts = analyzer.tokenStream("myField", new StringReader(input));
    CharTermAttribute charTermAtt = ts.addAttribute(CharTermAttribute.class);

    StringBuilder sb = new StringBuilder();
    try {
        ts.reset();
        while (ts.incrementToken()) {
            sb.append(charTermAtt.toString()).append(" ");
        }
        ts.end();
    } finally {
        ts.close();
    }
    return sb.toString().trim();
}

在不建立索引的情况下使用Lucene Analyzer-我的方法是否合理? [英] Using Lucene Analyzer Without Indexing - Is My Approach Reasonable?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

在不建立索引的情况下使用Lucene Analyzer-我的方法是否合理? [英] Using Lucene Analyzer Without Indexing - Is My Approach Reasonable?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭