Lucene NGram Tokenizer与Queryparser [英] Lucene NGram tokenizer with Queryparser

查看：86 发布时间：2020/5/4 7:54:15 lucene

本文介绍了Lucene NGram Tokenizer与Queryparser的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我已经为我的项目(NGramTokenizer(Version.LUCENE_44，reader，3，3))创建了用于模糊匹配的自定义三字组分析器-指定令牌大小最小3和最大3

I've created custom trigram analyzer for fuzzy match for my project (NGramTokenizer(Version.LUCENE_44, reader, 3, 3)) -- specifying token size min 3 and max 3

在索引时间内，我获得了正确的三字母组合标记，但是当我在查询时间内(通过QueryParser)使用相同的分析器时，其跳过的标记小于3个字符.

During index time I am getting proper trigram tokens but when I use same analyzer during query time (by QueryParser) its skipping tokens which are less then 3 chars.

示例

索引文件-嗨Rushik

Indexed Document - Hi Rushik

带索引的Tri-gram-hi_，i_r，rus，ush，shi，hik(使用Luke索引读取器进行了检查)

Indexed Tri-grams - hi_, i_r, rus, ush, s hik (checked it using Luke index reader)

查询-嗨Rushik AB XYZ.

Query - Hi Rushik AB XYZ.

已解析的查询(QueryParser结果) (name_data:rus name_data:ush name_data:shi name_data:hik)name_data:xyz

Parsed Query (QueryParser result) (name_data:rus name_data:ush name_data:shi name_data:hik) name_data:xyz

如您所见，查询解析器删除了少于3个字符的令牌. 我了解我在标记过程中指定了3,3，但在这种情况下，索引编制也应该跳过少于3个计数的标记吗?

As you can see, query parser removed tokens which are less then 3 chars. I understand I specified 3,3 during tokenizing but in that case indexing also should've skipped tokens less then 3 count?

我想我在这里缺少什么，有什么帮助吗?

I think I am missing something here, any help?

Lucene NGram Tokenizer与Queryparser [英] Lucene NGram tokenizer with Queryparser

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Lucene NGram Tokenizer与Queryparser [英] Lucene NGram tokenizer with Queryparser

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭