完成提示符的标记字符串 [英] Tokenizing string for completion suggester

查看:83
本文介绍了完成提示符的标记字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

希望使用完成建议程序来构建电子商务网站的自动完成功能.

want to build the auto complete functionality of an e-commerce website, using Completion Suggester.

这是我的索引:

PUT myIndex
{
    "mappings": {
        "_doc" : {
            "properties" : {
                "suggest" : {
                    "type" : "completion"
                },
                "title" : {
                    "type": "keyword"
                }, 
                "category" : { 
                    "type": "keyword"
                },
                "description" : { 
                    "type": "keyword"
                }
            }
        }
    }
}

现在,当上传广告时,我希望标题字段用于自动完成,所以这就是我上传文档的方式:

Now, when uploading the advertisement I want the title field to be used for auto complete, so this is how I upload a document:

POST dummy/_doc
{
  "title": "Blue asics running shoes",
  "category": "sports",
  "description": "Nice blue running shoes, size 44 eu",
  "suggest": {
    "input": "Blue Asics running shoes" // <-- use title
  }
}

问题是,弹性搜索仅从开始就匹配字符串...即"Blu"将找到结果,但"Asic"或"Run"或"Sho"将不会返回任何结果...

Problem is, this way, elastic search only matches the string from beginning... i.e. "Blu" will find result but "Asic" or "Run" or "Sho" won't return anything...

所以我需要做的是这样标记输入:

So what I need to do is to tokenize my input like this:

POST dummy/_doc
{
  "title": "Blue asics running shoes",
  "category": "sports",
  "description": "Nice blue running shoes, size 44 eu",
  "suggest": {
    "input": ["Blue", "Asics", "running", "shoes"] // <-- tokenized title
  }
}

这会很好...但是我应该如何标记我的字段?我知道我可以在c#中拆分字符串,但是无论如何,我可以在Elasticsearch/Nest中做到这一点吗?

This would work fine... But how am I supposed to tokenize my field? I know I can split the string in c#, but is there anyway that I can do this in Elasticsearch/Nest?

推荐答案

完成建议程序设计用于使用simple分析器而不是standard分析器来快速按需搜索前缀查询. text数据类型的默认值.

Completion suggester is designed for fast search-as-you-type prefix queries, using a simple analyzer, and not the standard analyzer which is default for text datatypes.

如果您需要标题中的 any 标记上的部分前缀匹配,而不仅仅是标题的开头,那么您可能需要考虑采用以下方法之一:

If you need partial prefix matching on any tokens in the title and not just from the beginning of the title, you may want to consider taking one of these approaches:

  1. 使用分析API 使用分析器,该分析器会将标题标记化为您希望部分前缀匹配的标记/术语,然后将此集合作为input索引到completion字段.标准分析仪可能是一个很好的起点.

  1. use Analyze API with an analyzer that will tokenize the title into tokens/terms from which you would want to partial prefix match, and index this collection as the input to the completion field. The Standard analyzer may be a good one to start with.

请记住,完成建议程序的数据结构在使用中保存在内存中,因此跨文档的长期基数将增加此数据结构的内存需求.还应考虑到匹配项的计分"很简单,因为它受应用于每个输入的权重的控制.

Bear in mind that the data structure for completion suggester is held in memory whilst in use, so high terms cardinality across documents will increase the memory demands of this data structure. Also consider that "scoring" of matching terms is simple in that it is controlled by the weight applied to each input.

  1. 请勿在此处使用完成建议",而是将title字段设置为具有text数据类型/reference/current/multi-fields.html"rel =" nofollow noreferrer>多字段,其中包括应分析(或不分析)title的不同方式,其中keyword子字段用于例子).

  1. Don't use the Completion Suggester here and instead set up the title field as a text datatype with multi-fields that include the different ways that title should be analyzed (or not analyzed, with a keyword sub field for example).

花一些时间使用Analyze API来构建一个分析器,该分析器允许在标题中的任何位置使用部分术语前缀.首先,诸如标准令牌生成器,小写令牌过滤器,Edgengram令牌过滤器以及可能的Stop令牌过滤器之类的工具会使您运行起来.另外请注意,您需要一个搜索分析器类似于索引分析器 Edgengram令牌过滤器,因为不需要对搜索输入中的令牌进行ngram.

Spend some time with the Analyze API to build an analyzer that will allow for partial prefix of terms anywhere in the title. As a start, something like the Standard tokenizer, Lowercase token filter, Edgengram token filter and possibly Stop token filter would get you running. Also note that you would want a Search analyzer that does something similar to the Index analyzer except Edgengram token filter, as tokens in the search input would not need to be ngrammed.

这篇关于完成提示符的标记字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆