在检查搜索中忽略特殊字符(标题) [英] Ignore Special Characters (tittles) in Examine search

查看:36
本文介绍了在检查搜索中忽略特殊字符(标题)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用 Umbraco v6,检查搜索(不是完整的 Lucene 查询).这是一个拉丁美洲/南美网站.我问过我的同事他们如何在搜索/URL 中输入标题(字母上的重音标记),他们都说他们没有,他们只是使用常规"字符(A-Z,a-z).

Using Umbraco v6, Examine search (not full blown Lucene queries). This is a Latin/South American website. I've asked my colleagues how they type in tittles (accent mark over a letter) for search/URL, and they all said that they don't, they just use "regular" characters (A-Z, a-z).

我知道如何在传递给 Examine 时去除字符串中的特殊字符 OUT,但我需要另一种方式,如在 Examine 中从属性中删除特殊字符以匹配查询.我有许多节点"名称中有标题(这是我正在搜索的属性之一).

I know how to strip special characters OUT of the string when passing to Examine, but I need the other way around, as in Examine removing the special characters from properties to match to the query. I have numerous "nodes" that have tittles in the name (which is one of the properties that I am searching on).

我研究过的帖子:

我已经尝试编写 luence 查询(或者我认为是这样),但我没有得到任何点击.

I've tried writing the luence query (or so I think) but I'm not getting in any hits.

// q is my query from QueryString
var searcher = ExamineManager.Instance.SearchProviderCollection["CustomSearchSearcher"];

//var query = searcher.CreateSearchCriteria().Field("nodeName", q).Or().Field("description", q).Compile();
//var searchResults = searcher.Search(query).OrderByDescending(x => x.Score).TakeWhile(x => x.Score > 0.05f);

var searchResults = searcher.Search(Global.RemoveSpecialCharacters(q), true).OrderByDescending(x => x.Score).TakeWhile(x => x.Score > 0.05f);

全局类

    public static string RemoveSpecialCharacters(string str)
    {
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < str.Length; i++)
        {
            if ((str[i] >= '0' && str[i] <= '9')
                    || (str[i] >= 'A' && str[i] <= 'z' || (str[i] == '.' || str[i] == '_'))
                || str[i] == 'á' || str[i] == 'é' || str[i] == 'í' || str[i] == 'ñ' || str[i] == 'ó' || str[i] == 'ú')
            {
                sb.Append(str[i]);
            }
        }

        return sb.ToString();
    }

如上所述,我需要从 Lucene 中删除特殊字符(标题),而不是传入的查询.

As stated above, I need special characters (tittles) removed from Lucene, not the query passed in.

来自:https://our.umbraco.org/documentation/reference/searching/examine/overview-explanation

我也读过有关分析器"的文章,但我以前从未与它们合作过,也不知道要获取/安装/添加到 VS 等中的哪个(哪些).这是解决此问题的更好方法吗???

I've also read about "Analyzers", but I have never worked with them before, nor know which one(s) to get/install/add to VS, etc. Is that the better way to go about this??

推荐答案

自定义分析器就是答案.

A custom analyzer is the answer.

这是在 umbraco 论坛上的回答:https://our.umbraco.org/forum/developers/extending-umbraco/16396-Examine-and-accents-for-portuguese-language

This is answered on the umbraco forum here: https://our.umbraco.org/forum/developers/extending-umbraco/16396-Examine-and-accents-for-portuguese-language

制作一个去除所有特殊字符的分析器:

Make a analyzer that strips all special characters:

  public class CIAIAnalyser : Analyzer
{
    public override TokenStream TokenStream(string fieldName, System.IO.TextReader reader)
    {
        StandardTokenizer tokenizer = new StandardTokenizer(Lucene.Net.Util.Version.LUCENE_29, reader);

        tokenizer.SetMaxTokenLength(255);
        TokenStream stream = new StandardFilter(tokenizer);
        stream = new LowerCaseFilter(stream);
        return new ASCIIFoldingFilter(stream);

    }

}

然后对搜索输入执行相同的操作.

Then you do the same for the search input.

   public class CleanAccent
{
    public static string RemoveDiacritics(string input)
    {
        // Indicates that a Unicode string is normalized using full canonical decomposition.

        if (String.IsNullOrEmpty(input)) return input;

        string inputInFormD = input.Normalize(NormalizationForm.FormD);
        var sb = new StringBuilder();

        for (int idx = 0; idx < inputInFormD.Length; idx++)
        {
            UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(inputInFormD[idx]);
            if (uc != UnicodeCategory.NonSpacingMark)
            {
                sb.Append(inputInFormD[idx]);
            }
        }

        return (sb.ToString().Normalize(NormalizationForm.FormC));
    }

}

然后在ExamineSettings.config 中引用分析器.

then reference the analyzer in ExamineSettings.config.

这篇关于在检查搜索中忽略特殊字符(标题)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆