如何不按频率对 SOLR 拼写检查建议进行排序? [英] How to sort SOLR spellCheck suggestions NOT by frequency?

查看:19
本文介绍了如何不按频率对 SOLR 拼写检查建议进行排序?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果您在我的暂存索引中搜索 ahve,您会得到 the 作为第一个拼写检查更正,因为 the 出现的比 have 多 在索引中(我索引了 500 个文档).
如果您在我的本地索引中搜索 ahve,您会得到 have 作为第一个拼写更正,因为 have 在索引中出现的次数比任何其他词都多.(我索引了 21 个文档).
这是从我的暂存索引返回的一个简单的哑巴

If you search for ahve on my staging index you get the as the first spellcheck correction because the appears more than have in the index (I have 500 documents indexed).
If you search for ahve on my local index you get have as the first spellcheck correction because have appears more than any other word in the index. (I have 21 documents indexed).
This is a simple dumb returned from my staging index

<lst name="ahve">
<int name="numFound">5</int>
<int name="startOffset">0</int>
<int name="endOffset">4</int>
<int name="origFreq">0</int>
<arr name="suggestion">
<lst>
<str name="word">the</str>
<int name="freq">112</int>
</lst>
<lst>
<str name="word">are</str>
<int name="freq">67</int>
</lst>
<lst>
<str name="word">have</str>
<int name="freq">44</int>
</lst>
<lst>
<str name="word">acne</str>
<int name="freq">10</int>
</lst>
<lst>
<str name="word">ache</str>
<int name="freq">3</int>
</lst>
</arr>
</lst>

并且添加 spellcheck.onlyMorePopular=truespellcheck.onlyMorePopular=false 不会改变任何东西.
有没有办法不按出现频率对返回的建议进行排序?

And adding spellcheck.onlyMorePopular=true or spellcheck.onlyMorePopular=false does NOT change anything.
Is there a way not to sort the returned suggestions by frequency of appearance?

推荐答案

默认情况下,拼写检查结果是根据 Levenshtein 字符串距离公式然后频率,或频率然后得分.

By default, spellcheck results are returned based on the Levenshtein string distance formula and then frequency, or the frequency and then score.

您可以通过编写实现 Comparator 的自定义比较器来指定自己的排序方法.然后,将该方法的名称提供给 solrconfig.xml 中的 comparatorClass 字段.

You can specify your own sorting method by writing a custom comparator that implements Comparator. Then, provide the name of that method to the field comparatorClass in your solrconfig.xml.

<lst name="spellchecker">
  <str name="name">freq</str>
  <str name="field">lowerfilt</str>
  <str name="spellcheckIndexDir">spellcheckerFreq</str>
  <!-- comparatorClass be one of:
     1. score (default)
     2. freq (Frequency first, then score)
     3. A fully qualified class name
   -->
  <str name="comparatorClass">my.custom.ComparatorClass</str>
  <str name="buildOnCommit">true</str>
</lst>

另外几个建议:

  • 字段 spellcheck.onlyMorePopular 不影响排序顺序.此字段会检查每个建议的查询结果,并仅显示查询结果最多的建议,即使存在正确的建议.谨慎使用.

  • The field spellcheck.onlyMorePopular doesn't affect sort ordering. This field checks the query results for each suggestion, and displays only the suggestions with the most query results, even if the correct suggestion exists. Use with caution.

通过 requestHandler 的索引和查询端的 StopFilterFactory 传递数据,确保删除停用词,例如the"、that"等.

Make sure to remove stopwords such as 'the', 'that', etc, by passing in your data through the StopFilterFactory on both the index and query side of your requestHandler.

请参阅:http://wiki.apache.org/solr/SpellCheckComponent 了解更多信息.

See: http://wiki.apache.org/solr/SpellCheckComponent for more information.

这篇关于如何不按频率对 SOLR 拼写检查建议进行排序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆