Solr您的意思是(拼写检查组件) [英] Solr Did you mean (Spell check component)
问题描述
我在我的应用程序中使用了solr,并且集成了拼写检查组件,但是我遇到了一些问题:
I use solr for my apps and i integrated the spellcheck component but i have some problems :
第一: 当我输入以空格分隔的术语时,它们会为我修正每个术语
First : When i type a term separated by space they give me the correction for each term
例如:浪费" => 术语" ,但实际情况是瓦特
Eg : "wat ters" => "what term" but the true is watters
第二: 当我输入带有某些错误术语的短语时.尽管其他术语是正确的,但它们对所有术语都适用该咒语.
Second : When i type some phrase with some wrong term. although the other terms are correct they apply the spell for all terms.
例如:"语言中的免除权使用约定" =>语言上的差异使用转换".
Eg : "Difreences in lankuage use conventions" => "Differences in language use conversions".
真实的是语言使用约定中的差异"
The true is "Differences in language use conventions"
这是我在 solrconfig.xml 中的配置:
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">textSpell</str>
<lst name="spellchecker">
<str name="name">default</str>
<str name="field">spell</str>
<str name="spellcheckIndexDir">spellchecker</str>
</lst>
</searchComponent>
<requestHandler name="/spell" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="spellcheck.onlyMorePopular">true</str>
<str name="spellcheck.extendedResults">false</str>
<str name="spellcheck.count">1</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
Schema.xml:
字段类型:
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"/>
<filter class="solr.ASCIIFoldingFilterFactory" />
<filter class="solr.SnowballPorterFilterFactory" language="English"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
<filter class="solr.ASCIIFoldingFilterFactory" />
<filter class="solr.SnowballPorterFilterFactory" language="English"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
</analyzer>
<analyzer type="multiterm" >
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory" />
</analyzer>
</fieldType>
<fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100" omitNorms="true">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
<filter class="solr.LowerCaseFilterFactory"/>
<!--<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>-->
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
字段:
<field name="title" type="text" indexed="true" stored="true" termVectors="true"/>
<field name="spell" type="textSpell" indexed="true" stored="true" multiValued="true"/>
复制字段
<copyField source="title" dest="spell"/>
感谢您的帮助
欢呼
推荐答案
For your first problem you could use WordBreakSpellChecker
对于第二个问题,您可以将 <str name="spellcheck.onlyMorePopular">true</str>
设置为<str name="spellcheck.onlyMorePopular">false</str>
并查看其是否具有预期的结果.
As for your second problem you could set <str name="spellcheck.onlyMorePopular">true</str>
to <str name="spellcheck.onlyMorePopular">false</str>
and see if this has the expected result.
这篇关于Solr您的意思是(拼写检查组件)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!