Azure搜索未使用返回正确的结果.搜索查询中的(点) [英] Azure search not returning correct result with . (dot) in search query

查看:47
本文介绍了Azure搜索未使用返回正确的结果.搜索查询中的(点)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们已将文档存储到天蓝色搜索中.该文档之一具有以下字段值.

We have stored documents into azure search. One of the document is having below field value.

标题":"statistics_query.compute_shader_invocations.secondary_inherited失败"

"Title": "statistics_query.compute_shader_invocations.secondary_inherited fails"

我们已根据MS Azure团队的建议在其上定义了自定义分析器,以解决由于_(下划线)而面临的问题之一.

We have defined custom analyzer on it as per the recommendation from MS Azure Team, in order to resolve one of the issue we were facing due to _ (underscore).

{
  "name": "myindex",
  "fields": [
        {
            "name": "id",
            "type": "Edm.String",
            "searchable": true,
            "filterable": true,
            "retrievable": true,
            "sortable": false,
            "facetable": false,
            "key": true,
            "indexAnalyzer": null,
            "searchAnalyzer": null,
            "analyzer": null
        },
        {
            "name": "Title",
            "type": "Edm.String",
            "searchable": true,
            "filterable": true,
            "retrievable": true,
            "sortable": true,
            "facetable": true,
            "key": false,
            "indexAnalyzer": null,
            "searchAnalyzer": null,
            "analyzer": "remove_underscore"
        }
],
  "analyzers": [
    {
      "name": "remove_underscore",
      "@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
      "charFilters": [
        "remove_underscore"
      ],
      "tokenizer": "standard_v2"
    }
  ],
  "charFilters": [
    {
      "name": "remove_underscore",
      "@odata.type": "#Microsoft.Azure.Search.MappingCharFilter",
      "mappings": [
        "_=>-"
      ]
    }
  ]
}

但是,当我在天蓝色搜索索引(版本号2016-09-01预览版)上使用以下过滤器进行搜索时,没有得到任何结果.

However, when I search with below Filters on my azure search index (version # 2016-09-01 Preview), i didnt get any result.

$ filter = search.ismatch('"compute_shader_invocations *"','Title','full','any')

$filter=search.ismatch('"compute_shader_invocations*"','Title', 'full', 'any')

$ filter = search.ismatch('"compute_shader_invocations"','Title','full','any')

$filter=search.ismatch('"compute_shader_invocations"','Title', 'full', 'any')

$ filter = search.ismatch('"shader_invocations *"','Title','full','any')

$filter=search.ismatch('"shader_invocations*"','Title', 'full', 'any')

但是,如果我在文本中包含(.)点字符,则可以使用相同的过滤器.

However, if I include the text with (.) dot character, the same filter works.

$ filter = search.ismatch('"query.compute_shader *"','标题','完整','任何')

$filter=search.ismatch('"query.compute_shader*"','Title', 'full', 'any')

根据我的测试,如果文档在过滤器中使用的搜索字词之后或之前有一个点(.)字符,则搜索不会返回结果.

Based on my tests, if the document is having a dot (.) character present right after or before the search term used in the filters, then the search doesnt return result.

因此,下面的过滤器将不起作用,因为在查询中使用的搜索字词之前和之后,文档中存在一个(.)点字符.在我们的示例中,Azure搜索文档中的单词"compute"之前和单词"invocations"之后都有一个点字符.

So, below filters wont work as there is a (.) dot character present in the document, right before and after the search terms used in the query. In our case there is a dot character present before word "compute" and after word "invocations" in the Azure Search Document.

$ filter = search.ismatch('"compute_shader_invocations *"','Title','full','any')

$filter=search.ismatch('"compute_shader_invocations*"','Title', 'full', 'any')

$ filter = search.ismatch('"compute_shader"','Title','full','any')

$filter=search.ismatch('"compute_shader"','Title', 'full', 'any')

$ filter = search.ismatch('"shader_invocations *"','Title','full','any')

$filter=search.ismatch('"shader_invocations*"','Title', 'full', 'any')

但是下面的过滤器应该起作用,因为在Azure搜索文档中,单词"query"之前或单词"shadder"之后不存在点字符

However below filters should work, as there is no dot character present before the word "query" or after the word "shadder" in the Azure search document

$ filter = search.ismatch('"query.compute_shader *"','Title','full','any')$ filter = search.ismatch('"shader *"','Title','full','any')

$filter=search.ismatch('"query.compute_shader*"','Title', 'full', 'any') $filter=search.ismatch('"shader*"','Title', 'full', 'any')

这让我发疯.任何帮助将不胜感激.

This is driving me crazy. Any help would be highly appreciated.

推荐答案

tl; dr 通配符查询未执行自定义分析.非通配符查询应返回结果,因此请仔细检查

tl;dr Wildcard queries don't have custom analysis performed. Non wildcard queries should return results, so please double check

详细答案

因此,点(.)实际上与您观察到的行为无关.您要发出2类搜索查询:

So, the dot (.) actually doesn't have anything to do with the behavior you are observing. There are 2 classes of search queries you are issuing:

  1. 通配符查询 *
  2. 非通配符查询(例如"compute_shader" )

通常,您发出的非通配符查询将进行与索引中任何自定义分析器所定义的分析相同的分析.如果使用通配符查询,则不执行任何分析.

In general, a non wildcard query you issue, will undergo the same analysis as defined by any custom analyzer in your index. In case of wildcard queries, no analysis is performed.

现在以您的文档文本为例"statistics_query.compute_shader_invocations.secondary_inherited失败" ,您定义的自定义分析器会将其分解为令牌.(仅供参考:您可以使用分析API 来请参阅细分).

Now taking your document text as an example "statistics_query.compute_shader_invocations.secondary_inherited fails", the custom analyzer you defined will break it down into tokens. (FYI: You can use the Analyze API to see the breakdown).

以下通配符查询成功

$ filter = search.ismatch('"shader *"','Title','full','any')

$filter=search.ismatch('"shader*"','Title', 'full', 'any')

因为,当您在源文档上运行分析时,会有诸如"shader"

because, when you run the analysis on the source document, there are tokens like "shader"

以下通配符查询不成功

$ filter = search.ismatch('"compute_shader_invocations *"','Title','full','any')$ filter = search.ismatch('"shader_invocations *"','Title','full','any')

$filter=search.ismatch('"compute_shader_invocations*"','Title', 'full', 'any') $filter=search.ismatch('"shader_invocations*"','Title', 'full', 'any')

因为在使用自定义分析器分析源文档时,没有像"computer_shader_invocations" "shader_invocations" 这样的标记.

because there are no tokens like "computer_shader_invocations" or "shader_invocations" when the source document is analyzed with your custom analyzer.

这不应该成功,但是有趣的是你说它成功了

This one shouldn't succeed as well, but interestingly you say that it does:

$ filter = search.ismatch('"query.compute_shader *"','标题','完整','任何')

$filter=search.ismatch('"query.compute_shader*"','Title', 'full', 'any')

现在让我们集中讨论没有通配符的查询.

Let's focus now on queries without wildcards.

$ filter = search.ismatch('"compute_shader_invocations"','Title','full','any')$ filter = search.ismatch('"compute_shader"','Title','full','any')

$filter=search.ismatch('"compute_shader_invocations"','Title', 'full', 'any') $filter=search.ismatch('"compute_shader"','Title', 'full', 'any')

从技术上讲,应该使用自定义分析器正确标记这些标记,并应具有匹配的结果.

These should technically get tokenized correctly using the custom analyzer and should have matching results.

能否请您验证在最后三个突出显示的实例中您的查询在您的原始问题中是否正确?当我尝试创建样本索引并根据您的配置发出搜索请求时,这些是我注意到的3个异常.我希望对这些问题进行一些澄清.

Could you please verify whether your queries in the last 3 highlighted instances were correct in your original question? When I tried to create a sample index and issued a search request based on your configuration, those were the 3 anomalies I noticed. I would appreciate some clarification around those.

此外,通常文档围绕Azure搜索中全文搜索的工作原理,是获得有关我提到的某些内容的深入详细信息的好地方.

Also, in general the documentation around how full text search in Azure search works is a great place to get in-depth details about some of the things that I mentioned.

这篇关于Azure搜索未使用返回正确的结果.搜索查询中的(点)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆