弹性搜索短语建议我建议我的索引中不存在的建议 [英] Elasticsearch phrase suggester is suggesting me suggestions that do not exists in my index

查看:125
本文介绍了弹性搜索短语建议我建议我的索引中不存在的建议的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个弹性搜索索引,我有一些数据。我实现了你的意思是功能,所以当用户写一些拼写错误的东西时,它可以接收到一个适当的话语的建议。



我使用了这个短语,因为我需要关于短语的建议,例如名称,问题是索引中不存在某些建议。



示例:

 索引中的文档:编码像主人
搜索:像老板一样编码
建议:< em>编码< / em>像老板
搜索结果:没有找到

我的问题是,没有短语在我的索引符合指定的建议,所以它建议我不存在的短语,因此将给我一个未找到的搜索。



我可以做什么?短语建议者不应该为索引中实际存在的短语提供建议?



这里我将留下相应的查询,映射和设置,以防您需要它。



设置和映射

  
settings:{
index:{
number_of_shards:3,
number_of_replicas:1,
search.slowlog.threshold.fetch .warn:2s,
index.analysis.analyzer.default.filter.0:standard,
index.analysis.analyzer.default.tokenizer:standard
index.analysis.analyzer.default.filter.1:smallcase,
index.analysis.analyzer.default.filter.2:asciifolding,
index .priority:3,
analysis:{
analyzer:{
suggest_analyzer:{
tokenizer:smallcase,
过滤器:[
小写,
asciifolding,
shingle_filter
],
type:custom
}
},
filter:{
shingle_filter b $ bmin_shingle_size:2,
max_shingle_size:3,
type:shingle
}
}
}
}
mappings:{
my_type:{
properties:{
suggest_field:{
analyzer suggest_analyzer,
type:string
}
}
}
}
}

查询

  $ {
field:suggest_field,
size:1,
gram_size:1,
confidence:2.0
}
}
}

感谢您的帮助。

解决方案

实际上是这样。如果您使用分析api 分析文档,你会得到一个更好的图片发生了什么。

  GET suggest_index / _analyze?text =编码像主&分析器= suggest_analyzer 

这是输出

  {
tokens:[
{
token:coding,
start_offset:0,
end_offset:6,
type :word,
position:1
},
{
token:encoding like,
start_offset:0,
end_offset:11,
type:shingle,
position:1
},
{
token如
start_offset:0,
end_offset:13,
type:shingle,
position:1
} ,
{
token:like,
start_offset:7,
end_offset:11,
type:word
position:2
},
{
token:l
start_offset:7,
end_offset:13,
type:shingle,
position:2
} ,
{
token:like a master,
start_offset:7,
end_offset:20,
type:shingle ,
position:2
},
{
token:a,
start_offset:12,
end_offset :13,
type:word,
position:3
},
{
token:a master,
start_offset:12,
end_offset:20,
type:shingle,
position:3
},
{
token:master,
start_offset:14,
end_offset:20,
type:word,
位置:4
}
]
}

可以看到,为t生成了一个令牌编码他的文字,因此它在你的索引。它是建议您不在索引中。如果您严格要求短语搜索,那么您可能需要考虑使用关键字tokenizer 。例如,如果您将映射更改为

  {
settings:{
index :{
analysis:{
analyzer:{
suggest_analyzer:{
tokenizer:smallcase,
filter [
smallcase,
asciifolding,
shingle_filter
],
type:custom
},
raw_analyzer:{
tokenizer:keyword,
filter:[
smallcase,
asciifolding
]

},
过滤器:{
shingle_filter:{
min_shingle_size:2,
max_shingle_size:3,
type:shingle
}
}
}
}
},
mappings:{
my_type {
properties:{
suggest_field:{
analyzer:suggest_analyzer,
type:string,
fields:{
raw:{
analyzer:raw_analyzer,
type:string
}
}
}
}
}
}
}

那么这个查询会给你预期的结果

  {
DidYouMean:{
text:codning lke a master,
phrase:{
field:suggest_field.raw,
size:1,
gram_size:1
}
}
}

它不会显示任何东西为emo编码像老板。



编辑1



2)从您的评论中还可以运行一些短语对我自己的数据集的建议,我觉得一个更好的方法是使用 collat​​e 选项短语suggester 提供,以便我们可以检查每个建议一个查询,只有在要从索引获取任何文档时才回馈建议。我还添加了 stemmers 映射到只考虑根字。我正在使用 light_english ,因为它不太积极。 更多



分析器的一部分映射如下所示

 分析 :{
analyzer:{
suggest_analyzer:{
tokenizer:standard,
filter:[
smallcase,
english_possessive_stemmer,
light_english_stemmer,
asciifolding,
shingle_filter
],
type:custom

},
filter:{
light_english_stemmer:{
type:stemmer,
language:light_english
},
english_possessive_stemmer:{
type:stemmer,
language:possive_english
},
shingle_filter :{
min_shingle_size:2,
max_shingle_size:4,
type:shingle
}
}
}

现在这个查询将给出你想要的结果。

  {
suggest:{
text:appel on the
simple_phrase:{
短语:{
field:suggest_field,
size:5,
collat​​e :{
query:{
inline:{
match_phrase:{
{{field_name}}:{{suggestion}}

}
},
params:{field_name:suggest_field},
prune:false
}
}
}
},
size:0
}

这将让你回到桌面上的
这里使用 match_phrase 查询,它会运行每个建议的短语指数。您可以使prune:true ,并查看所有建议的结果,无论匹配。您可能需要考虑使用停止过滤器来避免阻止。



希望这有帮助!!


I have an Elasticsearch index where I have some data. I implemented and did-you-mean feature so when the user write something misspelled it could receive a suggestion with the right words.

I used the phrase suggester because I need suggestions for short phrases, like names for example, the problem is that some suggestions do not exists in the index.

Example:

document in the index: coding like a master
search: Codning like a boss
suggestion: <em>coding</em> like a boss
search result: not found

My problem is that, there are no phrase in my index that match the specified suggestion, so it's recommending me phrases that do not exists and thus will give me a not found search.

What can I do with this? Shouldn't phrase suggester give suggestions for phrases that actually exists in the index?

Here I'll leave the corresponding query, mapping and setting just in case you need it.

Setting and Mappings

{
  "settings": {
    "index": {
      "number_of_shards": 3,
      "number_of_replicas": 1,
      "search.slowlog.threshold.fetch.warn": "2s",
      "index.analysis.analyzer.default.filter.0": "standard",
      "index.analysis.analyzer.default.tokenizer": "standard",
      "index.analysis.analyzer.default.filter.1": "lowercase",
      "index.analysis.analyzer.default.filter.2": "asciifolding",
      "index.priority": 3,
      "analysis": {
        "analyzer": {
          "suggests_analyzer": {
            "tokenizer": "lowercase",
            "filter": [
              "lowercase",
              "asciifolding",
              "shingle_filter"
            ],
            "type": "custom"
          }
        },
        "filter": {
          "shingle_filter": {
            "min_shingle_size": 2,
            "max_shingle_size": 3,
            "type": "shingle"
          }
        }
      }
    }
  },
  "mappings": {
    "my_type": {
      "properties": {
        "suggest_field": {
          "analyzer": "suggests_analyzer",
          "type": "string"
        }
      }
    }
  }
}

Query

{
  "DidYouMean": {
    "text": "Codning like a boss",
    "phrase": {
      "field": "suggest_field",
      "size": 1,
      "gram_size": 1,
      "confidence": 2.0
    }
  }
}

Thanks for your help.

解决方案

This is expected actually. If you analyze your document with analyze api, you will get a better picture of what is happening.

GET suggest_index/_analyze?text=coding like a master&analyzer=suggests_analyzer

This is the output

{
   "tokens": [
      {
         "token": "coding",
         "start_offset": 0,
         "end_offset": 6,
         "type": "word",
         "position": 1
      },
      {
         "token": "coding like",
         "start_offset": 0,
         "end_offset": 11,
         "type": "shingle",
         "position": 1
      },
      {
         "token": "coding like a",
         "start_offset": 0,
         "end_offset": 13,
         "type": "shingle",
         "position": 1
      },
      {
         "token": "like",
         "start_offset": 7,
         "end_offset": 11,
         "type": "word",
         "position": 2
      },
      {
         "token": "like a",
         "start_offset": 7,
         "end_offset": 13,
         "type": "shingle",
         "position": 2
      },
      {
         "token": "like a master",
         "start_offset": 7,
         "end_offset": 20,
         "type": "shingle",
         "position": 2
      },
      {
         "token": "a",
         "start_offset": 12,
         "end_offset": 13,
         "type": "word",
         "position": 3
      },
      {
         "token": "a master",
         "start_offset": 12,
         "end_offset": 20,
         "type": "shingle",
         "position": 3
      },
      {
         "token": "master",
         "start_offset": 14,
         "end_offset": 20,
         "type": "word",
         "position": 4
      }
   ]
}

As you can see, there is a token "coding" generated for the text and hence it is in your index. It is not suggesting you something that is not in index.If you strictly want phrase search, then you might want to consider using keyword tokenizer. For e.g if you change your mapping to something like

{
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "suggests_analyzer": {
            "tokenizer": "lowercase",
            "filter": [
              "lowercase",
              "asciifolding",
              "shingle_filter"
            ],
            "type": "custom"
          },
          "raw_analyzer": {
            "tokenizer": "keyword",
            "filter": [
              "lowercase",
              "asciifolding"
            ]
          }
        },
        "filter": {
          "shingle_filter": {
            "min_shingle_size": 2,
            "max_shingle_size": 3,
            "type": "shingle"
          }
        }
      }
    }
  },
  "mappings": {
    "my_type": {
      "properties": {
        "suggest_field": {
          "analyzer": "suggests_analyzer",
          "type": "string",
          "fields": {
            "raw": {
              "analyzer": "raw_analyzer",
              "type": "string"
            }
          }
        }
      }
    }
  }
}

then this query will give you expected results

{
  "DidYouMean": {
    "text": "codning lke a master",
    "phrase": {
      "field": "suggest_field.raw",
      "size": 1,
      "gram_size": 1
    }
  }
}

it wont show anything for "codning like a boss".

EDIT 1

2) From your comments and also from running some phrase suggestions on my own dataset, I feel a much better approach would be to use collate option phrase suggester provides so that we can check every suggestion against a query and give back suggestion only if it is going to get back any document from index. I have also added stemmers to mapping to consider only root word. I am using light_english as it is less aggressive. More on that.

Analyzer part of mapping looks like this now

 "analysis": {
     "analyzer": {
         "suggests_analyzer": {
             "tokenizer": "standard",
             "filter": [
                 "lowercase",
                 "english_possessive_stemmer",
                 "light_english_stemmer",
                 "asciifolding",
                 "shingle_filter"
             ],
             "type": "custom"
         }
     },
     "filter": {
         "light_english_stemmer": {
             "type": "stemmer",
             "language": "light_english"
         },
         "english_possessive_stemmer": {
             "type": "stemmer",
             "language": "possessive_english"
         },
         "shingle_filter": {
             "min_shingle_size": 2,
             "max_shingle_size": 4,
             "type": "shingle"
         }
     }
 }

Now this query will give you desired results.

{
   "suggest" : {
     "text" : "appel on the tabel",
     "simple_phrase" : {
       "phrase" : {
         "field" :  "suggest_field",
         "size" :   5,
         "collate": {
           "query": { 
             "inline" : {
               "match_phrase": {
                   "{{field_name}}" : "{{suggestion}}" 
               }
             }
           },
           "params": {"field_name" : "suggest_field"}, 
           "prune": false
         }
       }
     }
   },
   "size": 0
 }

This will give you back apple on the table Here match_phrase query is used which will run every suggested phrase against index. You can make "prune" : true and see all results that have been suggested regardless of the match. You might want to consider using stop filter to avoid stopwords.

Hope this helps!!

这篇关于弹性搜索短语建议我建议我的索引中不存在的建议的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆