Lucene.Net/SpellChecker-基于多词/短语的自动建议 [英] Lucene.Net/SpellChecker - multi-word/phrase based auto-suggest

查看:53
本文介绍了Lucene.Net/SpellChecker-基于多词/短语的自动建议的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经在自己的网站上实现了Lucenet.NET,并使用它对我的产品进行索引,这些产品包括伦敦的剧院表演,旅游和景点.

我要实施一个你是说意思吗?"当用户拼写错误的产品名称时,该功能会考虑整个产品标题,而不仅仅是单个单词.例如

如果用户键入:

Lodnon Eye

我想自动建议:

伦敦 伦敦眼

我假设我需要让分析器将标题作为单个实体来索引,以便SpellChecker可以对短语以及各个单词进行最接近匹配.

我该怎么做?

解决方案

我最近在lucene.net中实现了短语自动建议系统.

基本上,java版本的lucene在contrib文件夹之一中具有shinglefilter,可将句子分解为所有可能的短语组合.不幸的是lucene.nets contrib过滤器还不存在,因此我们没有带状过滤器.

但是,只要版本相同,就可以通过lucene.net读取用Java编写的Lucene索引.所以我做了以下事情:

使用jake scotts链接的您的意思是"部分中列出的spellcheck.IndexDictionary方法在lucene.net中创建了一个咒语索引.请注意,仅创建单个单词而不是短语的拼写索引.

然后,我创建了一个Java应用程序,该程序使用带状疱疹过滤器创建我正在搜索的文本的短语并将其保存在临时索引中.

然后,我在dotnet中编写了另一种方法来打开此临时索引,并将每个短语作为行或文档添加到我的已包含单个单词的拼写索引中.诀窍是确保要添加的文档具有与其余拼写文档相同的格式,因此我在lucene.net项目中删除了拼写检查器代码中使用的方法,并对它们进行了编辑.

完成操作后,您可以调用spellcheck.suggest同类方法,并将其传递为拼写错误的短语,这将为您返回有效的建议.

I've implemented Lucenet.NET on my site, using it to index my products which are theatre shows, tours and attractions around London.

I want to implement a "Did you mean?" feature for when users misspell product names that takes the whole product titles into account and not just single words. For example,

If the user typed:

Lodnon Eye

I would like to auto-suggest:

London London Eye

I assume I nead to have the analyzer index the titles as if they are a single entity, so that SpellChecker can nearest-match on the phrase, as well as the individual words.

How would I do this?

解决方案

i've just recently implemented a phrase autosuggest system in lucene.net.

basically, the java version of lucene has a shinglefilter in one of the contrib folders which breaks down a sentence into all possible phrase combinations. Unfortunately lucene.nets contrib filters aren't quite there yet and so we don't have a shingle filter.

but, a lucene index written in java can be read by lucene.net as long as the versions are the same. so what i did was the following :

created a spell index in lucene.net using the spellcheck.IndexDictionary method as laid out in the "did you mean" section of jake scotts link. please note that only creates a spelling index of single words, not phrases.

i then created a java app that uses the shingle filter to create phrases of the text i'm searching and saves it in a temporary index.

i then wrote another method in dotnet to open this temporary index and add each of the phrases as a line or document into my spelling index that already contains the single words. the trick is to make sure the documents you're adding have the same form as the rest of the spell documents, so i ripped out the methods used in the spellchecker code in the lucene.net project and edited those.

once you've done that you can call the spellcheck.suggestsimilar method and pass it a misspelled phrase and it will return you a valid suggestion.

这篇关于Lucene.Net/SpellChecker-基于多词/短语的自动建议的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆