如何使用Lucene进行语音搜索? [英] How to implement a phonetic search using Lucene?

查看:45
本文介绍了如何使用Lucene进行语音搜索?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用Lucene 6.1.0,使用Soundex或任何适用于葡萄牙语的算法来实现语音搜索.我在互联网上发现了许多不完整的示例,他们在教如何实现自定义标记器,分析器,但是似乎这些示例中使用的抽象类在6.1.0版本中是不同的.谁能指出我在哪里可以找到一个很好的文档 Lucene,而不仅仅是 java 文档,而没有任何进一步的文档教如何将这些东西放在一起?

I want to implement a phonetic search using Lucene 6.1.0., using Soundex or any suitable algorithm for Portuguese. I found many incomplete examples over internet, teaching how to implement a custom tokenizer, analyzer, but it seems that the abstract classes used on those exapmples are not the same in the version 6.1.0. Can anyone point me out where I can find a good documentation an Lucene, not just java docs without any further documentation teaching how to put the things together?

谢谢.

推荐答案

The Analyzer documentation shows how to create your analyzer.

对于语音分析,您应该查看 org.apache.lucene.analysis.phonetic 软件包(您需要在构建路径中添加"lucene-analyzers-phonetic-6.1.0.jar",以及Apache的"commons-codec-1.10.jar")可以到达此处).

For phonetic analysis, you should look to the org.apache.lucene.analysis.phonetic package (You'll need to add "lucene-analyzers-phonetic-6.1.0.jar" to your build path, as well as Apache's "commons-codec-1.10.jar", which you can get here).

然后,您可以设置分析仪,例如:

Then you can setup your analyzer something like, for instance:

Analyzer analyzer = new Analyzer() {
    @Override
    protected TokenStreamComponents createComponents(String fieldName) {
        Tokenizer tokenizer = new StandardTokenizer();
        TokenStream stream = new DoubleMetaphoneFilter(tokenizer, 6, false);
        return new TokenStreamComponents(tokenizer, stream);
    }
};

这篇关于如何使用Lucene进行语音搜索?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆