如何使用字母数字字符和破折号索引字段以进行通配符搜索 [英] How to index a field with alphanumeric characters AND a dash for wildcard search

查看:173
本文介绍了如何使用字母数字字符和破折号索引字段以进行通配符搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出一个看起来像这样的模型:

Given a model that looks like this:

{
    [Key]
    public string Id { get; set; }

    [IsSearchable]
    [Analyzer(AnalyzerName.AsString.Keyword)]
    public string AccountId { get; set; }
}

AccountId的示例数据如下:

And sample data for the AccountId that would look like this:

1-ABC123
1-333444555
1-A4KK498

该字段可以包含字母/数字和中间的破折号的任意组合.

The field can have any combination of letters/digits and a dash in the middle.

我需要能够使用1-ABC *之类的查询在此字段上进行搜索.但是,除了关键字(Keyword)之外,没有一个基本的分析器支持破折号,关键字(Keyword)不选择任何通配符查询,仅完全匹配.我看过其他一些有关自定义分析器的文章,但是我没有获得有关如何构建它来解决此问题的足够信息.

I need to be able to search on this field using queries like 1-ABC*. However, none of the basic analyzers seem to support the dash except Keyword, which isn't picking up any wildcard queries, only fully matching. I've seen some other articles about custom analyzers, but I can't get enough information about how to build it to solve this issue.

我需要知道是否必须为此领域构建客户分析器,我是否需要其他搜索分析器和索引分析器?

I need to know if I have to build a customer analyzer for this field, and do I need a different search analyzer and index analyzer?

我正在将StandardLucene用于其他不带破折号的字母数字字段,而我还有一个带破折号的字段,但是全都是数字,并且Keyword在这里工作得很好.看来问题出在字母和数字的混合.

I'm using StandardLucene for other alphanumeric fields without dashes, and I have another field with dashes but it's all digits, and Keyword works just fine there. It seems the issue is with a mix of letters AND digits.

推荐答案

自定义分析器确实是解决问题的方法. 基本上,您可以定义一个自定义分析器,该分析器使用带有小写"令牌过滤器的关键字"令牌生成器.

Custom analyzer is indeed the way to go here. Basically you could define a custom analyzer that uses a "keyword" tokenizer with a "lowercase" token filter.

将自定义分析器添加到Index类,并在模型中更改分析器名称以匹配自定义分析器名称:

Add the custom analyzer to your Index class, and change the analyzer name in your model to match the custom analyzer name:

new Index()
{
    ...
    Analyzers = new[]
    {
        new CustomAnalyzer()
        {
            Name = "keyword_lowercase",
            Tokenizer = TokenizerName.Keyword,
            TokenFilters = new[] { TokenFilterName.Lowercase }
        }
    }
}

型号:

{
    [Key]
    public string Id { get; set; }

    [IsSearchable]
    [Analyzer("keyword_lowercase")]
    public string AccountId { get; set; }
}

在REST API中,它类似于:

In the REST API this would look something like:

{
    "fields": [{
        "name": "Id",
        "type": "Edm.String",
        "key": true
    },
    {
        "name": "AccountId",
        "type": "Edm.String",
        "searchable": true,
        "retrievable": true,
        "analyzer": "keyword_lowercase"
     }],
    "analyzers":[
        {
           "name":"keyword_lowercase",
           "@odata.type":"#Microsoft.Azure.Search.CustomAnalyzer",
           "tokenizer":"keyword_v2",
           "tokenFilters":["lowercase"]
        }
     ]
}

这篇关于如何使用字母数字字符和破折号索引字段以进行通配符搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆