忽略检查中的特殊字符 [英] Ignore special characters in Examine

查看:179
本文介绍了忽略检查中的特殊字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Umbraco,我使用Examine在网站中进行搜索,但内容为法文.一切工作正常,但当我搜索Français"时,结果与"Francais"不同.有没有办法忽略那些法语字符?我尝试找到Leucene/Examine的FrenchAnalyser,但没有找到任何东西.我使用Fuzzy,因此即使单词不相同,它也会返回结果.

In Umbraco, I use Examine to search in the website but the content is in french. Everything works fine except when I search for "Français" it's not the same result as "Francais". Is there a way to ignore those french characters? I try to find a FrenchAnalyser for Leucene/Examine but did not found anything. I use Fuzzy so it return results even if the words is not the same.

这是我搜索的代码:

public static ISearchResults Search(string searchTerm)
        {
            var provider = ExamineManager.Instance.SearchProviderCollection["ExternalSearcher"];
            var criteria = provider.CreateSearchCriteria(BooleanOperation.Or);

            var crawl = criteria.GroupedOr(BoostedSearchableFields, searchTerm.Boost(15))
            .Or().GroupedOr(BoostedSearchableFields, searchTerm.Fuzzy(Fuzziness))
            .Or().GroupedOr(SearchableFields, searchTerm.Fuzzy(Fuzziness))
            .Not().Field("umbracoNavHide", "1");

            return provider.Search(crawl.Compile());
        }

推荐答案

我们最终使用了基于SnowballAnalyzer

public class CustomAnalyzer : SnowballAnalyzer
{
    public CustomAnalyzer() : base("French") { }

    public override TokenStream TokenStream(string fieldName, TextReader reader)
    {
        TokenStream result = base.TokenStream(fieldName, reader);

        result = new ISOLatin1AccentFilter(result);

        return result;
    }
}

这篇关于忽略检查中的特殊字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆