Azure搜索-排序时口音不敏感的分析器不起作用 [英] Azure Search - Accent insensitive analyzer not working when sorting

查看:65
本文介绍了Azure搜索-排序时口音不敏感的分析器不起作用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Azure搜索.我有一个具有此属性的属性的模型

I'm using Azure Search. I have a model with a property with this attributes

[IsRetrievable(true), IsSearchable, IsSortable, Analyzer("standardasciifolding.lucene")]
public string Title { get; set; }

我希望搜索对重音不敏感.尽管它在搜索/过滤时起作用,但在对结果进行排序时却不起作用.因此,如果我的单词以重音开头并且按字母顺序排序,则这些结果将显示在列表的末尾.

I want the search to be accent insensitive. Although it is working when searching/filtering, it is not working when sorting the results. So, If I have words that start with an accent and I sort alphabetically, those results appear at the end of the list.

推荐答案

我通过使用Id和使用standardasciifolding.lucene分析器的Title字段创建索引来验证您的用例.然后,我通过REST API提交了4个示例记录:

I verified your use case by creating an index with Id and a Title field that uses the standardasciifolding.lucene analyzer. I then submitted the 4 sample records via the REST API:

{
"value": [
    {
        "@search.action": "mergeOrUpload",
        "Id": "1",
        "Title" : "øks"
    },
    {
        "@search.action": "mergeOrUpload",
        "Id": "2",
        "Title": "aks"
    },      
    {
        "@search.action": "mergeOrUpload",
        "Id": "3",
        "Title": "áks"
    },
    {
        "@search.action": "mergeOrUpload",
        "Id": "4",
        "Title": "oks"
    }                   
]}

然后我运行一个指定了$ orderby的查询.我用带两个大括号括起来的变量的Postman.替换为您的环境的相关值.

I then ran a query with $orderby specified. I used Postman with variables wrapped in double curly braces. Replace with relevant values for your environment.

https://{{SEARCH_SVC}}.{{DNS_SUFFIX}}/indexes/{{INDEX_NAME}}/docs?search = *& $ count = true& $ select = Id,Title& searchMode = all&queryType = full& api-version = {{API-VERSION}}& $ orderby =标题升序

https://{{SEARCH_SVC}}.{{DNS_SUFFIX}}/indexes/{{INDEX_NAME}}/docs?search=*&$count=true&$select=Id,Title&searchMode=all&queryType=full&api-version={{API-VERSION}}&$orderby=Title asc

结果返回为

{
    "@odata.context": "https://<my-search-service>.search.windows.net/indexes('dg-test-65224345')/$metadata#docs(*)",
    "@odata.count": 4,
    "value": [
        {
            "@search.score": 1.0,
            "Id": "2",
            "Title": "aks"
        },
        {
            "@search.score": 1.0,
            "Id": "4",
            "Title": "oks"
        },
        {
            "@search.score": 1.0,
            "Id": "3",
            "Title": "áks"
        },
        {
            "@search.score": 1.0,
            "Id": "1",
            "Title": "øks"
        }
    ]
}

因此,排序顺序确实是a,o,á,ø,它确认您找到的内容.如果我更改为$ orderby = Title desc,则顺序相反.因此,排序似乎是由原始值而不是归一化值完成的.我们可以通过向POST请求发送示例标题到分析仪来检查分析仪的工作方式

So, the sort order is indeed a, o, á, ø which confirms what you find. The order is inversed if I change to $orderby=Title desc. Thus, the sorting appears to be done by the original value and not the normalized value. We can check how the analyzer works, by posting a sample title to the analyzer with a POST request to

https://{{SEARCH_SVC}}.{{DNS_SUFFIX}}/indexes/{{INDEX_NAME}}/docs?search = *& $ count = true& $ select = Id,Title& searchMode = all&queryType = full& api-version = {{API-VERSION}}& $ orderby =标题升序

https://{{SEARCH_SVC}}.{{DNS_SUFFIX}}/indexes/{{INDEX_NAME}}/docs?search=*&$count=true&$select=Id,Title&searchMode=all&queryType=full&api-version={{API-VERSION}}&$orderby=Title asc

{  "text": "øks",  "analyzer": "standardasciifolding.lucene" }

哪个会产生以下令牌

{
"@odata.context": "https://<my-search-service>.search.windows.net/$metadata#Microsoft.Azure.Search.V2020_06_30_Preview.AnalyzeResult",
"tokens": [
    {
        "token": "oks",
        "startOffset": 0,
        "endOffset": 3,
        "position": 0
    },
    {
        "token": "øks",
        "startOffset": 0,
        "endOffset": 3,
        "position": 0
    }
]

}

您可以尝试定义一个生成规范化版本的自定义分析器,但是我不确定它是否可以工作.例如,排序似乎不支持不区分大小写的排序,这与该用例有关,在该用例中,应该对多个字符进行排序,就好像它们是规范化版本一样.例如.a和A不能根据

You could try to define a custom analyzer which produces a normalized version, but I am not sure it will work. For example, the sorting does not appear to support case-insensitive sorting which would be related to this use case where multiple characters should be sorted as if they were a normalized version. E.g. a and A cannot be sorted as the same character according to this user voice entry (feel free to vote for it).

WORKAROUND

我能想到的最好的解决方法是自己处理数据.让 Title 包含原始标题,然后创建另一个名为 TitleNormalized 的字段,用于存储规范化的版本.然后,在您的应用程序中,您可以在 TitleNormalized 字段上使用$ orderby进行查询.

The best workaround I can think of is to process the data yourself. Let Title contain the original title, and then create another field called TitleNormalized where you store the normalized version. In your application you would then query with $orderby on the TitleNormalized field.

这篇关于Azure搜索-排序时口音不敏感的分析器不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆