ngram的弹性搜索查询问题 [英] elasticsearch query issue with ngram

查看：138 发布时间：2017/8/7 4:43:12 search elasticsearch sense

本文介绍了ngram的弹性搜索查询问题的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的索引中有这些数据

https://gist.github.com/bitgandtter/6794d9b48ae914a3ac7c

如果您在映射中注意到使用ngram从3个令牌到20。执行这个查询时，p>

  GET / my_index / user / _search ？search_type = dfs_query_then_fetch 
 {
查询：{
已过滤：{
查询：{
multi_match：{
查询：F，
fields：[username，firstname，middlename，lastname]，
analyzer：custom_search_analyzer
} 
} 
} 
} 
}

我应该得到我已经索引的8个文件，但我得到只有6个离开两个与他们的名字是弗朗茨和弗朗西斯。我期望有这两个也因为它包括在数据中。由于某种原因，它不起作用。

当我执行：

  GET / my_index / user / _search？search_type = dfs_query_then_fetch 
 {
query：{
filtered：{
query：{
multi_match ：{
query：Fran，
fields：[username，firstname，middlename，lastname]，
analyzer：custom_search_analyzer 
} 
} 
} 
} 
}

我得到这两个文件。

如果我降低ngram从1开始我得到所有的文件，但我认为这将影响查询。

这里缺少什么？感谢提前。

注意：所有的例子都是有意义的编码

解决方案

这是预期的，因为min_gram被指定为3，这意味着自定义分析器产生的令牌的最小长度是3个代码点。

因此，第一个令牌Franz Silva将是Fra。
因此，令牌F不会在此文档中匹配。

可以使用以下方式测试分析仪生成的令牌：

  curl -Xgethttp：//< server> / index_name / _analyze？analyzer = custom_analyzer& text = Franz Silva

另请注意，由于上面指定的 custom_analyzer 没有指定 token_chars ，令牌可以包含空格。

i have this data in my index

https://gist.github.com/bitgandtter/6794d9b48ae914a3ac7c

If you notice in the mapping im using the ngram from 3 tokens to 20.

when i execute this query:
GET /my_index/user/_search?search_type=dfs_query_then_fetch { "query": { "filtered": { "query":{ "multi_match":{ "query": "F", "fields": ["username","firstname","middlename","lastname"], "analyzer": "custom_search_analyzer" } } } } }
I should get the 8 documents i have indexed but i get only 6 leaving out two with their names are Franz and Francis. I expect to have those two also because the f its included in the data. for some reason its not working.

when i execute:
GET /my_index/user/_search?search_type=dfs_query_then_fetch { "query": { "filtered": { "query":{ "multi_match":{ "query": "Fran", "fields": ["username","firstname","middlename","lastname"], "analyzer": "custom_search_analyzer" } } } } }
i get those two documents.

If i lower the ngram to start at 1 i get all the documents but i think this will affect the performance of the query.

What im missing here. Thanks in advance.

NOTE: all the examples are coded used sense
解决方案
This is expected since the min_gram is specified as 3 it would mean that the minimum length of token produced by the custom analyzer is 3 codepoints.

Hence the first token for "Franz Silva" would be "Fra". Hence token "F" would not be a match on this document.

One can test out the tokens produced by the analyzer using :
curl -Xget "http://<server>/index_name/_analyze?analyzer=custom_analyzer&text=Franz Silva"
Also note since the "custom_analyzer" specified above does not specify "token_chars", the tokens can contain spaces.

这篇关于ngram的弹性搜索查询问题的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

ngram的弹性搜索查询问题 [英] elasticsearch query issue with ngram

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

ngram的弹性搜索查询问题 [英] elasticsearch query issue with ngram

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

登录关闭