匹配有效时,简单字词查询无法与Elastic一起使用 [英] Simple term query not working with elastic while match works

查看:86
本文介绍了匹配有效时,简单字词查询无法与Elastic一起使用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Elastic中有一个如下所示的JSON对象。

I have a JSON object like below in Elastic.

{
    "_source" : {
      "version" : 1,
      "object_id" : "f1dcae27-7a6f-4fea-b540-901c09b60a15",
      "object_name" : "testFileName_for_TestSweepAndPrune",
      "object_type" : "",
      "object_status" : "OBJ_DELETED",
      "u_attributes" : ""
    }

}

我这样的字词查询不起作用。

My term query like this doesn't work.

{
            "query": {
                "term": {
                    "object_status": "OBJ_DELETED"
                }
            },
            "size": 10000

}

在相同条件下匹配查询仍然可以正常工作。

Wile match query works fine with same conditions.

{
            "query": {
                "match": {
                    "object_status": "OBJ_DELETED"
                }
            },
            "size": 10000

}

想知道什么会在这里发生吗?

Wondering what could be happening here? How can I make the term query work here for this condition?

推荐答案

以了解为什么 term code>查询无法正常工作,您需要检查 ElasticSearch 的处理方式和保存数据以及匹配方式 term 查询是不同的。

To understand why term query is not working as you expect it we need to check how ElasticSearch process and saves data and how match and term queries are different.

通常,当您将某些文本保存到 ElasticSearch中时首先进行分析,然后保存。分析是通过 analyzer 完成的。分析仪很多,但是如果您未指定任何分析仪,则将使用默认分析仪。分析器处理文本,将其转换为标记数组,并保存标记列表。对于每个特定的分析器,如何将文本拆分为令牌的规则是不同的。

Normally when you save some text into ElasticSearch it is analyzed first and then saved. Analysis is done by analyzer. There are many analyzers, but if you don't specify any then default one will be used. Analyzer processes text, converts it into array of tokens and saves the list of tokens. The rules how text is splitted into tokens are different for each particular analyzer.

处理并保存文本后,您可以对其进行查询。查询内容的方法有很多,但就您而言, match term 之间的主要区别是 match 全文查询条款术语级别查询。事实是,在全文搜索的情况下,对查询字符串的分析方式与对查询字段的分析方式相同。在术语级别查询中,不分析查询字符串。

When text is processed and saved you can query it. There are many ways to query something, but in your case the main difference between match and term is that match is full text query and term is term level query. The thing is that in case of full text search your query string is analyzed in the same way as the field you are querying was analyzed. In term level queries query string is not analyzed. It's important to note.

现在让我们看看 ElasticSearch如何分析 OBJ_DELETED 。为此,我们可以添加以下简单文档:

Now let's see how "OBJ_DELETED" is analyzed by ElasticSearch. For that we can add simple document like this:

curl -X PUT 'localhost:9200/testdata/object/1' -H 'Content-Type: application/json' -d '{ "object_status": "OBJ_DELETED"  }'

然后检查所有内容是否存在:

Then check that everything is there:

curl -X POST 'localhost:9200/testdata/_search?pretty'

应产生以下内容:

...
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [
  {
    "_index" : "testdata",
    "_type" : "object",
    "_id" : "1",
    "_score" : 1.0,
    "_source" : {
      "object_status" : "OBJ_DELETED"
    }
  }
]

}

现在我们可以检查 OBJ_DELETED 分析:

Now we can check how "OBJ_DELETED" is analyzed:

curl -X POST 'localhost:9200/testdata/_analyze?pretty' -H 'Content-Type: application/json' -d '{ "text": "OBJ_DELETED"  }'

并输出:

{
  "tokens" : [
    {
      "token" : "obj_deleted",
      "start_offset" : 0,
      "end_offset" : 11,
      "type" : "<ALPHANUM>",
      "position" : 0
    }
  ]
}

如您所见,它仅将文本转换为小写并保存为一个标记。这就是默认分析器的工作方式。现在返回您的查询。 match 查询有效,因为查询值 OBJ_DELETED 也被转换为小写形式,因此 ElasticSearch 可以找到它。对于 term 查询,查询字符串未处理,因此实际上您正在将 OBJ_DELETED obj_deleted比较显然没有任何结果。

As you can see it only converted text into lowercase and saved it as one token. This is how default analyzer does it. Now returning to your queries. match query works because query value "OBJ_DELETED" is also converted to lowercase under the hood and thus ElasticSearch can find it. And for term query the query string is not processed so actually you are comparing OBJ_DELETED with obj_deleted and obviously you get no results.

最后一个问题:为什么 object_status.keyword 可以用于 term 查询吗?

And last question: why object_status.keyword works for term query?

默认情况下 ElasticSearch 创建每个文本字段的其他映射。您可以使用这种元数据。此外,它还允许您以不同的方式处理相同的值。因此,默认情况下,每个文本字段均具有名称附加的映射关键字,其类型为关键字关键字字段不会进行分析(只有在需要时才可以对其进行规范化处理)。这意味着对于默认映射,它将保存您传递给 ElasticSearch 的确切值(在您的情况下为 OBJ_DELETED )。

By default ElasticSearch create additional mapping for each text field. It's kind of metadata that you can use. Also it allows you to process the same value in different ways. So by default each text field has additional mapping with name keyword which has type keyword. keyword fields are not analyzed (they only can be normalyzed if needed). It means that for default mapping it saves the exact value that you passes to ElasticSearch (OBJ_DELETED in your case).

这篇关于匹配有效时,简单字词查询无法与Elastic一起使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆