如何构建弹性搜索查询，使文档字段中的每个令牌都匹配？ [英] How do I build an elastic search query such that each token in a document field is matched?

查看：82 发布时间：2017/8/7 2:27:39 elasticsearch

本文介绍了如何构建弹性搜索查询，使文档字段中的每个令牌都匹配？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要确保一个字段的每个令牌在用户搜索中至少与一个令牌匹配。

这是一个简化的例子。

让 Store_Name =Square Steakhouse

当用户搜索Square或Steakhouse时，可以轻松构建与此文档匹配的查询。此外，使用kstem过滤器附加到默认分析器，Steakhouses也可能匹配。

  {
size ：$ 
查询：{
match：{
Store_Name：{
query：Square，
operator AND
} 
} 
} 
}

不幸的是，我需要Store_Name字段的每个令牌进行匹配。我需要以下行为：

 查询：Square Steakhouse结果：匹配
查询：Square Steakhouses结果：匹配
查询：平方牛排结果：匹配
查询：平方结果：无匹配
查询：牛排馆结果：无匹配

总结

不能使用not_analyzed，因为我需要利用分析器功能

我打算使用kstem，自定义同义词，自定义char_filter，小写过滤器以及标准的标记符

但是，我需要确保字段的每个令牌都匹配

这可能是在弹性搜索中？

解决方案

这是一个很好的方法。

这不是完美的，但在简单，计算和存储方面是一个很好的妥协。

指数

获取搜索文本的令牌计数

执行过滤的查询并强制执行结果之间的令牌数量等于

您将要使用分析API以获取令牌计数。确保使用与相关字段相同的分析仪。这是一个获取令牌计数的VB.NET函数：

 私有函数GetTokenCount（ByVal RawString As String，可选ByVal Analyzer作为字符串=default）As Integer 
如果Trim（RawString）=然后返回0 
 
 Dim client = New ElasticConnection（）
 Dim result = client.Post（http ：// localhost：9200 / myindex / _analyze？analyzer =& Analyzer，RawString）'提交分析请求usign PlainElastic.NET API 
 Dim J = JObject.Parse（result.ToString（））'填充JSON。 NET JObject 
返回（从X in J（tokens））Count（）'使用JSON.NET JObject返回令牌计数
 
结束函数

您将希望在索引时使用它来存储有问题的字段的令牌计数。确保在TokenCount的映射中有一个条目

这是一个很好的弹性搜索查询，用于利用这个新的令牌计数信息：

  {
size：30，
query：{
filtered：{
 ：{
match：{
MyField：{
query：[query]，
operator：AND
 
} 
}，
过滤器：{
term：{
TokenCount：[tokencount] 
} 
 } 
} 
} 
}

将[查询]替换为搜索词

将[tokencount]替换为搜索字词中的令牌数（使用上面的GetTokenCount函数

这样可以确保在 MyField 中至少有与令牌匹配的匹配数。

上面有一些缺点，例如，如果我们是matchi在蓝色字段中，用户搜索蓝色蓝色，则会触发匹配。因此，您可能希望使用唯一的令牌过滤器。您也可以调整过滤器，以便

参考

克林顿Gormely启发了解决方案

I need to make sure that each token of a field is matched by at least one token in a user's search.

This is a generalized example for the sake of simplification.

Let Store_Name = "Square Steakhouse"

It is simple to build a query that matches this document when the user searches for Square, or Steakhouse. Furthermore, with kstem filter attached to the default analyzer, Steakhouses is also likely to match.

{
  "size": 30,
  "query": {
    "match": {
      "Store_Name": {
        "query": "Square",
        "operator": "AND"
      }
    }
  }
}

Unfortunately, I need each token of the Store_Name field to be matched. I need the following behavior:

Query: Square Steakhouse    Result: Match
Query: Square Steakhouses   Result: Match
Query: Squared Steakhouse   Result: Match
Query: Square               Result: No Match
Query: Steakhouse           Result: No Match

In summary

It is not an option to use not_analyzed, as I do need to take advantage of analyzer features
I intend to use kstem, custom synonyms, a custom char_filter, a lowercase filter, as well as a standard tokenizer

However, I need to make sure that each tokens of a field is matched

Is this possible in elastic search?

解决方案

Here is a good method.

It is not perfect, but it is a good compromise in terms of simplicity, computation, and storage.

Index the token count of the field
Obtain the token count of the search text
Perform a filtered query and enforce the number of tokens between the results to be equal

You will want to use the analyze API in order to get the token count. Make sure to use the same analyzer as the field in question. Here is a VB.NET function to obtain token count:

Private Function GetTokenCount(ByVal RawString As String, Optional ByVal Analyzer As String = "default") As Integer
    If Trim(RawString) = "" Then Return 0

    Dim client = New ElasticConnection()
    Dim result = client.Post("http://localhost:9200/myindex/_analyze?analyzer=" & Analyzer, RawString) 'Submit analyze request usign PlainElastic.NET API
    Dim J = JObject.Parse(result.ToString()) 'Populate JSON.NET JObject
    Return (From X In J("tokens")).Count() 'returns token count using a JSON.NET JObject

End Function

You will want to use this at index-time to store the token count of the field in question. Make sure there is an entry in the mapping for TokenCount

Here is a good elastic search query for utilizing this new token count information:

{
  "size": 30,
  "query": {
    "filtered": {
      "query": {
        "match": {
          "MyField": {
            "query": "[query]",
            "operator": "AND"
          }
        }
      },
      "filter": {
        "term": {
          "TokenCount": [tokencount]
        }
      }
    }
  }
}

Replace [query] with the search terms
Replace [tokencount] with the number of tokens in the search terms (using the GetTokenCount function above

This makes sure that all there are at least as many matches as tokens in MyField.

There are some drawbacks to the above. For example, if we are matching the field "blue red", and the user searches for "blue blue", the above will trigger a match. So, you may want to use a unique token filter. You may also wish to adjust the filter so that

Reference

Clinton Gormely inspired the solution

这篇关于如何构建弹性搜索查询，使文档字段中的每个令牌都匹配？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何构建弹性搜索查询，使文档字段中的每个令牌都匹配？ [英] How do I build an elastic search query such that each token in a document field is matched?

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

如何构建弹性搜索查询，使文档字段中的每个令牌都匹配？ [英] How do I build an elastic search query such that each token in a document field is matched?

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

登录关闭