如何在Lucene应用程序中使用ASCIIFoldingFilter? [英] How do I use ASCIIFoldingFilter in my Lucene app?

查看:72
本文介绍了如何在Lucene应用程序中使用ASCIIFoldingFilter?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个标准的Lucene应用程序,可以从索引中搜索.我的索引包含很多法语术语,我想使用ASCIIFoldingFilter.

I have a standard Lucene app which searches from an index. My index contains a lot of french terms and I'd like to use the ASCIIFoldingFilter.

我已经做了很多搜索,但我不知道如何使用它.构造函数接受TokenStream对象,在发送字段时,我是否在分析器上调用检索TokenStream的方法?那我该怎么办?有人可以指出使用TokenFilter的示例吗?谢谢.

I've done a lot of searching and I have no idea how to use it. The constructor takes a TokenStream object, do I call the method on the analyzer that retrieves a TokenStream when you send it a field? Then what do I do? Can someone point me to an example where a TokenFilter is being used? Thanks.

推荐答案

令牌过滤器(如ASCIIFoldingFilter)以TokenStream为基础,因此它们是分析器主要通过以下方法返回的内容:

The token filters - like the ASCIIFoldingFilter - are at their base a TokenStream, so they are something that the Analyzer returns mainly by use of the following method:

public abstract TokenStream tokenStream(String fieldName, Reader reader);

您已经注意到,过滤器将TokenStream作为输入.他们的行为就像包装器一样,或更准确地说,就像装饰器一样.这意味着它们可以增强所包含的TokenStream的行为,同时执行其操作和所包含的输入的操作.

As you have noticed, the filters take a TokenStream as an input. They act like wrappers or, more correctly said, like decorators to their input. That means they enhance the behavior of the contained TokenStream, performing both their operation and the operation of the contained input.

您可以在此处找到说明.它不是直接引用ASCIIFoldingFilter,而是适用相同的原理.基本上,您将创建一个自定义分析器,其中包含以下内容(示例如下):

You can find an explanation here. It is not directly refering to an ASCIIFoldingFilter but the same principle applies. Basically, you create a custom Analyzer with something like this in it (stripped down example):

public class CustomAnalyzer extends Analyzer {
  // other content omitted
  // ...
  public TokenStream tokenStream(String fieldName, Reader reader) {
    TokenStream result = new StandardTokenizer(reader);
    result = new StandardFilter(result);
    result = new LowerCaseFilter(result);
    // etc etc ...
    result = new StopFilter(result, yourSetOfStopWords);
    result = new ASCIIFoldingFilter(result);
    return result;
  }
  // ...
}

TokenFilter和Tokenizer都是 TokenStream 的子类.

Both the TokenFilter and the Tokenizer are subclasses of TokenStream.

还请记住,在索引编制和搜索中必须使用相同的自定义分析器,否则您的查询可能会得到错误的结果.

Remember also that you must make use of the same custom analyzer both in indexing and searching or you might get incorrect results in your queries.

这篇关于如何在Lucene应用程序中使用ASCIIFoldingFilter?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆