如何在我的 Lucene 应用程序中使用 ASCIIFoldingFilter? [英] How do I use ASCIIFoldingFilter in my Lucene app?

查看:16
本文介绍了如何在我的 Lucene 应用程序中使用 ASCIIFoldingFilter?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个从索引中搜索的标准 Lucene 应用程序.我的索引包含很多法语术语,我想使用 ASCIIFoldingFilter.

I have a standard Lucene app which searches from an index. My index contains a lot of french terms and I'd like to use the ASCIIFoldingFilter.

我已经做了很多搜索,但我不知道如何使用它.构造函数接受一个 TokenStream 对象,当您向它发送一个字段时,我是否调用分析器上检索 TokenStream 的方法?那我该怎么办?有人可以指出一个使用 TokenFilter 的例子吗?谢谢.

I've done a lot of searching and I have no idea how to use it. The constructor takes a TokenStream object, do I call the method on the analyzer that retrieves a TokenStream when you send it a field? Then what do I do? Can someone point me to an example where a TokenFilter is being used? Thanks.

推荐答案

令牌过滤器 - 就像 ASCIIFoldingFilter - 在它们的基础上是一个 TokenStream,所以它们是分析器主要通过使用以下方法返回的东西:

The token filters - like the ASCIIFoldingFilter - are at their base a TokenStream, so they are something that the Analyzer returns mainly by use of the following method:

public abstract TokenStream tokenStream(String fieldName, Reader reader);

如您所见,过滤器将 TokenStream 作为输入.它们的作用类似于包装器,或者更准确地说,类似于输入的 装饰器.这意味着它们增强了包含的 TokenStream 的行为,同时执行它们的操作和包含的输入的操作.

As you have noticed, the filters take a TokenStream as an input. They act like wrappers or, more correctly said, like decorators to their input. That means they enhance the behavior of the contained TokenStream, performing both their operation and the operation of the contained input.

您可以在这里找到解释.它不是直接引用 ASCIIFoldingFilter 但同样的原则适用.基本上,您创建一个自定义分析器,其中包含类似的内容(精简示例):

You can find an explanation here. It is not directly refering to an ASCIIFoldingFilter but the same principle applies. Basically, you create a custom Analyzer with something like this in it (stripped down example):

public class CustomAnalyzer extends Analyzer {
  // other content omitted
  // ...
  public TokenStream tokenStream(String fieldName, Reader reader) {
    TokenStream result = new StandardTokenizer(reader);
    result = new StandardFilter(result);
    result = new LowerCaseFilter(result);
    // etc etc ...
    result = new StopFilter(result, yourSetOfStopWords);
    result = new ASCIIFoldingFilter(result);
    return result;
  }
  // ...
}

TokenFilter 和 Tokenizer 都是 TokenStream 的子类.

Both the TokenFilter and the Tokenizer are subclasses of TokenStream.

还请记住,您必须在索引和搜索中使用相同的自定义分析器,否则您可能会在查询中得到不正确的结果.

Remember also that you must make use of the same custom analyzer both in indexing and searching or you might get incorrect results in your queries.

这篇关于如何在我的 Lucene 应用程序中使用 ASCIIFoldingFilter?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆