ASP.NET中的弹性搜索-使用&符号 [英] Elastic Search in ASP.NET - using ampersand sign

查看:87
本文介绍了ASP.NET中的弹性搜索-使用&符号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是ASP.NET中的Elastic Search的新手,到目前为止,我有一个无法解决的问题.

I'm new to Elastic Search in ASP.NET, and I have a problem which I'm, so far, unable to resolve.

从文档中,我已经看到&符号未作为特殊字符列出.但是,当我提交搜索与符号时,它会被完全忽略.例如,如果我搜索procter & gamble,则完全忽略&符号.这给我带来了很多问题,因为我拥有一些公司的名称,例如M&S.当&符号被忽略时,我基本上得到了所有带有M或S的东西.如果我尝试使用精确搜索(M&S),则会遇到相同的问题.

From documentation, I've seen that & sign is not listed as a special character. Yet, when I submit my search ampersand sign is fully ignored. For example if I search for procter & gamble, & sign is fully ignored. That makes quite a lot of problems for me, because I have companies that have names like M&S. When & sign is ignored, I get basically everything that has M or S in it. If I try with exact search (M&S), I have the same problem.

我的代码是:

void Connect()
{            
    node = new Uri(ConfigurationManager.AppSettings["Url"]);
    settings = new ConnectionSettings(node);
    settings.DefaultIndex(ConfigurationManager.AppSettings["defaultIndex"]);
    settings.ThrowExceptions(true);
    client = new ElasticClient(settings);                        
}

private string escapeChars(string inStr) {
    var temp = inStr;
    temp = temp
        .Replace(@"\", @"\\")
        .Replace(@">",string.Empty)
        .Replace(@"<",string.Empty)
        .Replace(@"{",string.Empty)
        .Replace(@"}",string.Empty)
        .Replace(@"[",string.Empty)
        .Replace(@"]",string.Empty)
        .Replace(@"*",string.Empty)
        .Replace(@"?",string.Empty)
        .Replace(@":",string.Empty)
        .Replace(@"/",string.Empty);
    return temp;
}

然后进入我的功能之一

Connect();    
ISearchResponse<ElasticSearch_Result> search_result;            
var QString = escapeChars(searchString);                  
search_result = client.Search<ElasticSearch_Result>(s => s
    .From(0)
    .Size(101)
    .Query(q => 
        q.QueryString(b => 
            b.Query(QString)
            //.Analyzer("whitespace")
            .Fields(fs => fs.Field(f => f.CompanyName))                                
        )
    )
    .Highlight(h => h
        .Order("score")
        .TagsSchema("styled")
        .Fields(fs => fs
            .Field(f => f.CompanyName)
        )
    )
);

我尝试过包括分析器,但是后来我发现它们改变了分词器split单词的方式.我无法对令牌生成器进行更改.

I've tried including analyzers, but then I've found out that they change the way tokenizers split words. I haven't been able to implement changes to the tokenizer.

我希望能够有以下情形:

I would like to be able to have following scenario:

搜索:M&S Company Foo Bar

代币:M&S Company Foo Bar +如果可能也有M S代币,则奖励是

Tokens: M&S Company Foo Bar + bonus is if it's possible to have M S tokens too

我正在使用弹性搜索V5.0.

I'm using elastic search V5.0.

任何帮助都非常欢迎.包括比此处提供的文档更好的文档:

Any help is more than welcome. Including better documentation than the one found here: https://www.elastic.co/guide/en/elasticsearch/client/net-api/5.x/writing-queries.html.

推荐答案

默认情况下,分析器应用的文本字段为标准标记生成器应用于使用小写令牌过滤器.因此,当您根据该字段为某个值编制索引时,标准分析器将应用于该值,并且由此产生的令牌也将针对该字段编制索引.

By default for a text field the analyzer applied is standard analyzer. This analyzer applies standard tokenizer along with lowercase token filter. So when you are indexing some value against that field, the standard analyzer is applied on that value and the resultant tokens are indexed against the field.

让我们通过例如对于字段companyName(文本类型),我们假定索引文档时传递的值是M&S Company Foo Bar.应用标准分析仪后,该值的最终代币将为:

Let's understand this by e.g. For the field companyName (text type) let us assume that the value being passed is M&S Company Foo Bar while indexing a document. The resultant tokens for this value after the application of standard analyzer will be:

m
s
company
foo
bar

您会注意到,不仅空格,而且&都被用作分隔符,用于分割和生成令牌.

What you can notice is that not just whitespace but also & is used as delimiter to split and generate the tokens.

当您对此字段进行查询并且未在搜索查询中传递任何分析器时,默认情况下,它也会将同一分析器也应用于搜索,该分析器也将用于对该字段建立索引.因此,如果您搜索M&S,它将被标记为MS,因此实际的搜索查询将搜索这两个标记而不是M&S.

When you query against this field and don't pass any analyzer in the search query, it by default apply the same analyzer for search as well which is applied for indexing against the field. Therefore, if you search for M&S it get tokenised to M and S and thus actual search query search for these two tokens instead of M&S.

要解决此问题,您需要更改字段companyName的分析器.您可以使用

To solve this, you need to change the analyzer for the field companyName. Instead of standard analyzer you can create a custom analyzer which use whitespace tokenizer and lowercase filter (to make search case insensitive). For this you need to change the setting and mapping as below:

{
  "settings": {
    "analysis": {
      "analyzer": {
        "whitespace_lowercase": {
          "tokenizer": "whitespace",
          "filter": [
            "lowercase"
          ]
        }
      }
    }
  },
  "mappings": {
    "_doc": {
      "properties": {
        "companyName": {
          "type": "text",
          "analyzer": "whitespace_lowercase",
          "fields": {
            "keyword": {
              "type": "keyword"
            }
          }
        }
      }
    }
  }
}

现在对于上述输入,生成的令牌将为:

Now for the above input the tokens generated will be:

m&s
company
foo
bar

这将确保搜索M&S时不会忽略&.

This will ensure that when searching for M&S, & is not ignored.

这篇关于ASP.NET中的弹性搜索-使用&符号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆