通过SolrNet提供非英语语言支持 [英] Non-English Language support via SolrNet

查看:157
本文介绍了通过SolrNet提供非英语语言支持的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用SolrNet从.NET应用程序搜索Solr。
当我搜索英文单词时,一切正常。但是,如果I
使用西班牙语单词(如 español ),但我没有搜索结果,尽管我
已将它们编入索引。当我调试Solr时,发现查询
被解析为 espaA + ol



我需要做一些UTF-8编码吗,还是SolrNet只支持以ASCII字符搜索?

解决方案

<这不是一个SolrNet问题,它与Solr如何处理不在第一个127个ASCII字符集中的字符有关。最好的建议是将 ASCIIFoldingFilterFactory 添加到要存储西班牙语的Solr字段中话。

举例来说,如果您使用Solr示例中定义的 text_general fieldType,其设置如下在schema.xml文件中:

 < fieldType name =text_generalclass =solr.TextFieldpositionIncrementGap =100 > 
< analyzer type =index>
< tokenizer class =solr.StandardTokenizerFactory/>
< filter class =solr.StopFilterFactoryignoreCase =truewords =stopwords.txtenablePositionIncrements =true/>
<! - 在这个例子中,我们只会在查询时使用同义词
< filter class =solr.SynonymFilterFactorysynonyms同义词=index_synonyms.txtignoreCase =trueexpand =假/>
- >
< filter class =solr.LowerCaseFilterFactory/>
< / analyzer>
< analyzer type =query>
< tokenizer class =solr.StandardTokenizerFactory/>
< filter class =solr.StopFilterFactoryignoreCase =truewords =stopwords.txtenablePositionIncrements =true/>
< filter class =solr.SynonymFilterFactorysynonyms同义词=同义词.txtignoreCase =trueexpand =true/>
< filter class =solr.LowerCaseFilterFactory/>
< / analyzer>
< / fieldType>

我建议修改它,如下所示将ASCIIFoldingFilterFactory添加到索引和查询分析器中。

 < fieldType name =text_generalclass =solr.TextFieldpositionIncrementGap =100> 
< analyzer type =index>
< tokenizer class =solr.StandardTokenizerFactory/>
< filter class =solr.StopFilterFactoryignoreCase =truewords =stopwords.txtenablePositionIncrements =true/>
<! - 在这个例子中,我们只会在查询时使用同义词
< filter class =solr.SynonymFilterFactorysynonyms同义词=index_synonyms.txtignoreCase =trueexpand =假/>
- >
< filter class =solr.LowerCaseFilterFactory/>
< filter class =solr.ASCIIFoldingFilterFactory/>
< / analyzer>
< analyzer type =query>
< tokenizer class =solr.StandardTokenizerFactory/>
< filter class =solr.StopFilterFactoryignoreCase =truewords =stopwords.txtenablePositionIncrements =true/>
< filter class =solr.SynonymFilterFactorysynonyms同义词=同义词.txtignoreCase =trueexpand =true/>
< filter class =solr.LowerCaseFilterFactory/>
< filter class =solr.ASCIIFoldingFilterFactory/>
< / analyzer>
< / fieldType>

另外,请注意,在将此架构更改为更改后,您需要重新索引数据体现在索引中。


I am using SolrNet to search over Solr from an .NET application. Everything works fine when I search over English words. However if I use spanish words like español, I get no search result though I have indexed them. When I debugged over Solr, I found that the query was parsed as espaA+ol.

Do I have to do some UTF-8 encoding or does SolrNet supports search over only ASCII characters?

解决方案

This is not a SolrNet issue, it is related to how Solr handles characters that are not in the first 127 ASCII character set. The best recommendation is add the ASCIIFoldingFilterFactory to your Solr field where you are storing the Spanish words.

As an example, if you were using the text_general fieldType as defined in the Solr example which is setup as follows in the schema.xml file:

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
    <!-- in this example, we will only use synonyms at query time
    <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
    -->
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

I would recommend modifying it as follows adding the ASCIIFoldingFilterFactory to the index and query analyzers.

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
    <!-- in this example, we will only use synonyms at query time
    <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
    -->
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.ASCIIFoldingFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.ASCIIFoldingFilterFactory"/>
  </analyzer>
</fieldType>

Also, please note that you will need to reindex your data after making this schema change for the changes to be reflected in the index.

这篇关于通过SolrNet提供非英语语言支持的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆