Lucene标准分析仪与雪球 [英] Lucene Standard Analyzer vs Snowball

查看:135
本文介绍了Lucene标准分析仪与雪球的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

刚开始使用Lucene.Net。我使用标准分析器对100,000行进行了索引,运行了一些测试查询,并注意到如果原始词汇是单数的,则多个查询不会返回结果。我了解雪球分析仪增加了支持支持,这听起来不错。但是,我想知道是否有任何缺点与雪球超过标准?我是否因此而失去任何东西?是否有任何其他分析仪在那里考虑?

解决方案

是的,通过使用诸如Snowball之类的词干分析器,原文的形式。例如,Snowball会将组织变成器官,因此对组织的搜索将返回结果与器官,没有任何评分的惩罚。



这是否适合您取决于您​​的内容以及您支持的查询类型(例如,搜索非常基本,或者用户非常复杂,并使用搜索来精确地筛选结果)。您可能还需要考虑较不积极的词干,例如 KStem


Just getting started with Lucene.Net. I indexed 100,000 rows using standard analyzer, ran some test queries, and noticed plural queries don't return results if the original term was singular. I understand snowball analyzer adds stemming support, which sounds nice. However, I'm wondering if there are any drawbacks to gong with snowball over standard? Am I losing anything by going with it? Are there any other analyzers out there to consider?

解决方案

Yes, by using a stemmer such as Snowball, you are losing information about the original form of your text. Sometimes this will be useful, sometimes not.

For example, Snowball will stem "organization" into "organ", so a search for "organization" will return results with "organ", without any scoring penalty.

Whether or not this is appropriate to you depends on your content, and on the type of queries you are supporting (for example, are the searches very basic, or are users very sophisticated and using your search to accurately filter down the results). You may also want to look into less aggressive stemmers, such as KStem.

这篇关于Lucene标准分析仪与雪球的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆