短语提示与ngrams [英] Phrase suggester with ngrams

查看:66
本文介绍了短语提示与ngrams的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

因为我使用了选项 output_unigrams_if_no_shingles:true在 shingle_filter中在建议中搜索过滤器只会在结果中显示带状疱疹,但建议会显示ngrams

because I use the option "output_unigrams_if_no_shingles": true" in the "shingle_filter" filter for in the search for suggestion only show shingles in the results, but the suggestions display the ngrams

        "shingle_filter": {
        "type": "shingle",
        "min_shingle_size": 2,
        "max_shingle_size": 3,
        "output_unigrams_if_no_shingles": true

关注我的地图下方

    {
  "settings": {
    "index": {
      "number_of_shards": "5",
      "number_of_replicas": "0",
   
      
      "analysis": {
        "filter": {
          "stemmer_plural_portugues": {
            "name": "minimal_portuguese",
            "stopwords" : ["http", "https", "ftp", "www"],
            "type": "stemmer"
          },
          
          
       
        "ngram_filter": {
          "type": "ngram",
          "min_gram": 3,
          "max_gram": 3,
          "token_chars": [
            "letter",
            "digit"
          ]
          
        },
            "synonym_filter": {
            "type": "synonym",
            "lenient": true,
            "synonyms_path": "analysis/synonym.txt",
            "updateable" : false

          },
          
       
          "shingle_filter": {
            "type": "shingle",
            "min_shingle_size": 2,
            "max_shingle_size": 3,
            "output_unigrams_if_no_shingles": true
          }

        },
        
        "analyzer": {
          "analyzer_customizado": {
            "filter": [
              "lowercase",
              "stemmer_plural_portugues",
              "asciifolding",
                "synonym_filter",
                 "ngram_filter",
                  "shingle_filter"
              
            ],
            "tokenizer": "lowercase"
          }
        }

      }
    }
  },
  "mappings": {
      "properties": {

        "id": {
         "type": "long"
        },
         "data": {
          "type": "date"
        },
         "quebrado": {
          "type": "byte"
          
        },
         "pgrk": {
           "type":  "integer" 
        },
         "url_length": {
           "type":  "integer" 
        },
        "title": {
          "analyzer": "analyzer_customizado",
          "type": "text",
          "fields": {
            "keyword": {
              "ignore_above": 256,
              "type": "keyword"
            }
          }
        },
        "description": {
        "analyzer": "analyzer_customizado",
          "type": "text",
          "fields": {
            "keyword": {
              "ignore_above": 256,
              "type": "keyword"
            }
          }
        },
        "url": {
          "analyzer": "analyzer_customizado",
          "type": "text",
          "fields": {
            "keyword": {
              "ignore_above": 256,
              "type": "keyword"
            }
          }
        }
      }
    }
  
}

我在下面插入文档

{
    "title": "shopping",
    "description": "sex video",
    "url": "www.ohcs.com"
}

在下面的建议查询中,键入 video以错误的方式 vidio

In my suggestion query below I type "video" in the wrong way "vidio"

    {
  "suggest": {
    "text": "vidio",
    "simple_phrase": {
      "phrase": {
        "field": "description",
        "size": 1,
        "max_errors": 100,
        "direct_generator": [
          {
          "field" :            "description",
          "suggest_mode" :     "always",
          "min_word_length" :  1
          }
        ],
        "collate": {
          "query": { 
            "source" : {
              "match": {
                "{{field_name}}": {
                  "query": "{{suggestion}}",
                  "operator": "and"
                }
              }
            }
          },
          "params": {"field_name" : "description"},
          "prune": true
        },
        "highlight": {
          "pre_tag": "<strong>",
          "post_tag": "</strong>"
        }
      }
    }
  }
}

在建议搜索下面的结果中,结果显示正确的建议视频。但显示和几个ngram令牌而不是整个单词

in the result below the suggestion search the result displays the correct suggestion "video" but displays and several ngram tokens instead of the entire word

    {
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 0,
      "relation": "eq"
    },
    "max_score": null,
    "hits": [

    ]
  },
  "suggest": {
    "simple_phrase": [
      {
        "text": "vidio",
        "offset": 0,
        "length": 5,
        "options": [
          {
            "text": "vid ide deo",
            "highlighted": "vid <strong>ide deo</strong>",
            "score": 0.2648209,
            "collate_match": true
          }
        ]
      }
    ]
  }
}

我如何获得显示整个单词的建议结果 ;视频

how do I get the results of the suggestion to display the entire word "video" without being divided into several ngram tokens?

推荐答案

问题是您的ngram过滤器。您将最小语法设置为3,将最大语法设置为3。

The problem is your ngram filter. You set min gram as 3 and max gram as 3.

因此,您只能得到3个字母的单词。您可以将最大克数更改为所需的值。在示例中,如果设置为5,则可以在输出中获得视频。

Hence you are gettinf only 3 letter words. You can change max gram to the value you want. In your example if you set to 5, you can get video in your output.

您具有以下内容:

    "ngram_filter": {
      "type": "ngram",
      "min_gram": 3,
      "max_gram": 3,
      "token_chars": [
        "letter",
        "digit"
      ]
      
    },

这篇关于短语提示与ngrams的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆