在索引时删除变音符号到Solr [英] Remove diacritics at index time into Solr

查看:211
本文介绍了在索引时删除变音符号到Solr的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究Solr搜索微调。我使用的是Solr 4.0。



通常情况下,我与英语语言分析器和标记器一起工作,但是这次我使用葡萄牙语并且面临着因为它并没有真正提供我需要的预期结果。例如:我正在搜索单词保护,但是索引的是带有变音符号的próteses。所以它给出了错误的结果!



我需要做的是在索引和搜索之前删除所有的变音符号,因此它会给出正确的结果。然而,我无法找到如何处理这部分。



任何人都可以指向正确的方向吗?

解决方案

必须在可以包含变音符的字段上使用char映射过滤器。

例如:

 < / p> fieldType name =text_with_diacriticsclass =solr.TextField> 
<分析器>
< charFilter class =solr.MappingCharFilterFactorymapping =mapping-ISOLatin1Accent.txt/>
< tokenizer class =solr.StandardTokenizerFactory/>
< filter class =solr.LowerCaseFilterFactory/>
< / analyzer>
< / fieldType>

映射-ISOLatin1Accent.txt随Solr提供了许多变音符号映射。



显然,您必须在配置此过滤器后重新为您的文档编制索引。


I am working on a Solr search fine tuning. I'm using Solr 4.0.

Normally, I worked with language analyzers and tokenizers for English language, however this time I'm working with Portuguese language and I'm facing issue as it doesn't really give the expected result I need.

For example: I'm searching for word 'proteses' but what is indexed is 'próteses' which is with diacritics. So it gives wrong results!

What I need to do is remove all diacritics before indexing and search, so it gives correct results. However, I'm unable to find how to handle this part.

Can anyone point me in right direction?

解决方案

You have to use a char mapping filter on the fields that can contain diacritics. This filter will normalize them.

For example :

<fieldType name="text_with_diacritics" class="solr.TextField">     
    <analyzer>
        <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory" />
    </analyzer>     
</fieldType>

The mapping-ISOLatin1Accent.txt comes with Solr has mappings for many diacritics.

Obviously, you'll have to reindex your documents after you configured this filter.

这篇关于在索引时删除变音符号到Solr的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆