Azure搜索:搜索单词的单数形式,但结果中仍包含复数形式 [英] Azure Search: Searching for singular version of a word, but still include plural version in results

查看:111
本文介绍了Azure搜索:搜索单词的单数形式,但结果中仍包含复数形式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对我在自定义分析器(以及fr.microsoft分析器)中注意到的特殊行为有疑问.下面的Analyze API测试是使用"fr.microsoft"分析器显示的,但是当我使用"text_contains_search_custom_analyzer"自定义分析器时,我看到的是相同的行为(这是合理的,因为我将其基于fr.microsoft分析器).

I have a question about a peculiar behavior I noticed in my custom analyzer (as well as in the fr.microsoft analyzer). The below Analyze API tests are shown using the "fr.microsoft" analyzer, but I saw the same exact behavior when I use my "text_contains_search_custom_analyzer" custom analyzer (which makes sense as I base it off the fr.microsoft analyzer).

UAT报告说,当他们搜索"femme"(单数)时,他们希望还会找到带有"femmes"(复数)的文档.但是,当我使用Analyze API进行测试时,似乎Azure搜索服务仅标记了复数->复数+单数,但是当标记单数时,仅使用了单数令牌.请参阅下面的示例.

UAT reported that when they search for "femme" (singular) they expect documents with "femmes" (plural) to also be found. But when I tested with the Analyze API, it appears that the Azure Search service only tokenizes plural -> plural + singular, but when tokenizing singular, only singular tokens are used. See below for examples.

有没有一种方法可以允许用户搜索单词的单数形式,但仍在搜索结果中包含该单词的复数形式?还是我需要使用同义词来克服这个问题?

Is there a way I can allow a user to search for the singular version of a word, but still include the plural version of that word in the search results? Or will I need to use synonyms to overcome this issue?

带有女性"的请求 { "analyzer":"fr.microsoft", "text":"femme" }

Request with "femme" { "analyzer": "fr.microsoft", "text": "femme" }

来自女性"的回复 { "@ odata.context":" https://EXAMPLESEARCHINSTANCE.search.windows.net/ $元数据#Microsoft.Azure.Search.V2016_09_01.AnalyzeResult", 令牌":[ { "token":"femme", "startOffset":0, "endOffset":5 位置":0 } ] }

Response from "femme" { "@odata.context": "https://EXAMPLESEARCHINSTANCE.search.windows.net/$metadata#Microsoft.Azure.Search.V2016_09_01.AnalyzeResult", "tokens": [ { "token": "femme", "startOffset": 0, "endOffset": 5, "position": 0 } ] }

带有女性"的请求 { "analyzer":"fr.microsoft", "text":"femmes" }

Request with "femmes" { "analyzer": "fr.microsoft", "text": "femmes" }

来自女性"的回复 { "@ odata.context":" https://EXAMPLESEARCHINSTANCE.search.windows.net/ $元数据#Microsoft.Azure.Search.V2016_09_01.AnalyzeResult", 令牌":[ { "token":"femme", "startOffset":0, "endOffset":6 位置":0 }, { "token":"femmes", "startOffset":0, "endOffset":6 位置":0 } ] }

Response from "femmes" { "@odata.context": "https://EXAMPLESEARCHINSTANCE.search.windows.net/$metadata#Microsoft.Azure.Search.V2016_09_01.AnalyzeResult", "tokens": [ { "token": "femme", "startOffset": 0, "endOffset": 6, "position": 0 }, { "token": "femmes", "startOffset": 0, "endOffset": 6, "position": 0 } ] }

推荐答案

只需添加到 yoape 响应中,fr .microsoft分析器将变形词还原为它们的基本形式.在您的情况下,单词 femmes 还原为单数形式的 femme .您描述的所有情况都可以使用:

Just to add to yoape's response, the fr.microsoft analyzer reduces inflected words to their base form. In your case, the word femmes is reduced to its singular form femme. All cases that you described will work:

  1. 如果单词中存在变形的形式,则以单词的基本形式进行搜索.

    假设您正在使用 Vive with Femmes 为文档建立索引.
    搜索引擎将为以下术语建立索引: vif,vivre,vive, femme,femmes .
    如果您使用以下任何一项进行搜索,例如 femme ,则文档将匹配.

  2. 如果单词的基本形式在文档中,则使用单词的变体形式进行搜索.

    假设您要用teext Femme fatale 为文档建立索引.
    搜索引擎将为以下术语建立索引: femme,fatal,fatalle .
    如果使用术语 femmes 搜索,则分析器还将生成其基本形式.您的查询将成为 femmes femme .具有这些条款中任何一项的文档都将匹配.

  3. 在文档中搜索该单词是否存在另一种变体形式时使用变体形式进行搜索.

    如果您的文档中包含 allez ,则使用术语 allez aller 将被索引.
    如果搜索 alle ,查询将变为 alle aller .由于两种变形形式都简化为相同的基本形式,因此文档将匹配.
  1. Searching with the base form of a word if an inflected form was in the document.

    Let's say you're indexing a document with Vive with Femmes.
    The search engine will index the following terms: vif, vivre, vive, femme, femmes.
    If you search with any of these terms e.g., femme, the document will match.

  2. Searching with an inflected form of a word if the base form was in the document.

    Let's say you're indexing a document with teext Femme fatale.
    The search engine will index the following terms: femme, fatal, fatale.
    If you search with term femmes, the analyzer will produce also its base form. Your query will become femmes OR femme. Documents with any of these terms will match.

  3. Searching with an inflected from if another inflected form of that word was in the document.

    If you have a document with allez, terms allez and aller will be indexed.
    If you search for alle, the query becomes alle OR aller. Since both inflected forms are reduced to the same base form the document will match.

这里的主要学习是分析器不仅处理文档,而且还查询词条.术语是针对特定于语言的规则的规范化说明.

The key learning here is that the analyzer processes the documents but also query terms. Terms are normalized accounting for language specific rules.

我希望能解释一下.

这篇关于Azure搜索:搜索单词的单数形式,但结果中仍包含复数形式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆