ElasticSearch多个确切搜索字段不会返回结果 [英] ElasticSearch multiple exact search on field returns no results

查看:354
本文介绍了ElasticSearch多个确切搜索字段不会返回结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我很努力,我觉得应该工作,但也许我在做一些愚蠢的事情。此搜索:

  {
查询:
{
bool:
{
must:[
{match:{Element.sourceSystem.name:Source1 Source2}}
]
}
}

返回Source1和Source2的数据。添加术语搜索,如下所示,我希望返回第一个搜索的子集,只返回Source1s。没有任何东西返回,当运行第一个查询或它自己的。

  {
查询:
{
bool:
{
必须:[
{match:{Element.sourceSystem.name:Source1 Source2}}
{terms:{Element.sourceSystem.name:[Source1]}}
]
}
}
}

我意识到这是很难的,没有看到文件,但足以说Element.sourceSystem.name存在,是可用的第一个搜索工作正常 - 所有输入感激地收到。

解决方案

有些东西在 match 查询比条款查询



首先,a绕过分析仪:



假设您使用的是标准分析器的弹性搜索,由标准的标记器和一些令牌过滤器组成。标准标记器将在空格,标点符号和一些其他特殊字符上进行标记(将文本分割为术语)。详细信息可以在Elasticsearch文档中找到,所以现在让我们说每个单词将是一个术语。



分析器的第二个非常重要的部分是小写滤镜它会将术语转换为小写。这意味着,以后,搜索 Source1 source1 应该产生相同的结果。



所以一个简单的例子:


输入:这是我的英文输入文本。将分析并结束以下术语:this,is,my,input,text,in,english。




例如,当您将文档索引到文本字段时,会发生这些。我假设 Element.sourceSystem.name 是这种类型之一,因为你的正常匹配查询似乎有效。



现在,当您使用Source1 Source2发布匹配查询时,分析也将发生,并将其转换为令牌 source1 source2 。在内部,它将在布尔OR中创建2个术语查询。因此, source1 source2 必须匹配为您的查询结果。


顺便提一下,匹配查询支持 minimum_should_match 属性。您可以指定匹配查询需要匹配多少个条款。


以下是使用术语查询的线索。它分析您提供的文本。通常应该用于关键字类型的字段。关键字字段也未分析(有关更多信息,请阅读映射类型 - 其实很重要)。那么这是什么意思?




  • 如果我从上面拿我的例子,我的索引将包含this 是,我的,输入,文本,在,英语

  • c>英语将匹配,因为它将被分析到 english

  • 英文将永远不会匹配,因为我的索引中没有条款 English 。这是区分大小写的。



我非常乐观,如果你使用 source1 在你的术语查询中,它会匹配一些东西。但是,我非常怀疑您的查询是您的用例的出路。在查询文本字段时尝试使用正常匹配查询,(通常 - 不总是适用)仅在关键字字段中使用术语查询。


I'm struggling with this, which I feel should work but maybe I'm doing something stupid. This search:

{
   "query":
   {
     "bool":
     {
       "must":[
         {"match":{"Element.sourceSystem.name":"Source1 Source2"}}
       ]
  }
}

returns data for both Source1 and Source2. Adding a terms search, as underneath, I would expect to return a subset of the first search with just the Source1s returned. Nothing is returned, when run with the first query or on it's own.

{
  "query":
  {
    "bool":
    {
      "must":[
        {"match":{"Element.sourceSystem.name":"Source1 Source2"}},
        {"terms":{"Element.sourceSystem.name":["Source1"]}}
      ]
    }
  }
}

I realise this is hard without seeing the documents, but suffice it to say that "Element.sourceSystem.name" exists and is available as the first search works fine - all input gratefully received.

解决方案

There are some things that are handled differently in match queries than in terms queries.

First of all, a detour to analyzers:

Assuming you are using the standard analyzer of elasticsearch, which consists of a standard tokenizer and some token filters. The standard tokenizer will tokenize (split your text into terms) on spaces, punctuation marks and some other special characters. Details can be found in the Elasticsearch Documentation, so for now let's just say 'each word will be a term'.

The second, very important part of the analyzer is the lowercase filter. It will transform terms into lowercase. This means, later on, searching for Source1 and source1 should yield the same results.

So a short example:

Input : "This is my input text in English." will be analyzed and end up with the following terms: "this", "is", "my", "input", "text", "in", "english".

All of this happens when you index a document into a text field for example. I assume the Element.sourceSystem.name is one of this type, since your normal match query seems to work.

Now, when you issue a match query with "Source1 Source2", the analysis will also happen and transform it into tokens source1 and source2. Internally it will then create 2 term queries in a boolean OR. So either source1 or source2 must match to be a result of your query.

By the way, the match query supports a minimum_should_match property. You could specify, how many terms of your match query need to match.

Here's now the clue with the terms query. It does not analyze the text you provide. It's usually supposed to be used on fields of type keyword. Keyword fields are also not analyzed (for further information, please read the documentation of mapping types - it is actually pretty important). So what does this mean?

  • If I take my example from above, my index would contain "this", "is", "my", "input", "text", "in", "english".
  • A match query with English will match, because it will be analyzed to english
  • A term/s query with English will never match, because there is no term English in my index. It is case sensitive.

I am very positive, if you would use source1 in your terms query, it would match something. However, I highly doubt that your query is the way to go for your use case. Try using normal match queries when querying text fields and (in general - not always applicable) only use terms queries on keyword fields.

这篇关于ElasticSearch多个确切搜索字段不会返回结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆