匹配有效时,简单字词查询无法与Elastic一起使用 [英] Simple term query not working with elastic while match works
问题描述
我在Elastic中有一个如下所示的JSON对象。
I have a JSON object like below in Elastic.
{
"_source" : {
"version" : 1,
"object_id" : "f1dcae27-7a6f-4fea-b540-901c09b60a15",
"object_name" : "testFileName_for_TestSweepAndPrune",
"object_type" : "",
"object_status" : "OBJ_DELETED",
"u_attributes" : ""
}
}
我这样的字词查询不起作用。
My term query like this doesn't work.
{
"query": {
"term": {
"object_status": "OBJ_DELETED"
}
},
"size": 10000
}
在相同条件下匹配查询仍然可以正常工作。
Wile match query works fine with same conditions.
{
"query": {
"match": {
"object_status": "OBJ_DELETED"
}
},
"size": 10000
}
想知道什么会在这里发生吗?
Wondering what could be happening here? How can I make the term query work here for this condition?
推荐答案
以了解为什么 term > code>查询无法正常工作,您需要检查
ElasticSearch
的处理方式和保存数据以及匹配方式
和 term
查询是不同的。
To understand why term
query is not working as you expect it we need to check how ElasticSearch
process and saves data and how match
and term
queries are different.
通常,当您将某些文本保存到 ElasticSearch中时
首先进行分析,然后保存。分析是通过 analyzer 完成的。分析仪很多,但是如果您未指定任何分析仪,则将使用默认分析仪。分析器处理文本,将其转换为标记数组,并保存标记列表。对于每个特定的分析器,如何将文本拆分为令牌的规则是不同的。
Normally when you save some text into ElasticSearch
it is analyzed first and then saved. Analysis is done by analyzer. There are many analyzers, but if you don't specify any then default one will be used. Analyzer processes text, converts it into array of tokens and saves the list of tokens. The rules how text is splitted into tokens are different for each particular analyzer.
处理并保存文本后,您可以对其进行查询。查询内容的方法有很多,但就您而言, match
和 term
之间的主要区别是 match
是全文查询和条款
是术语级别查询。事实是,在全文搜索的情况下,对查询字符串的分析方式与对查询字段的分析方式相同。在术语级别查询中,不分析查询字符串。
When text is processed and saved you can query it. There are many ways to query something, but in your case the main difference between match
and term
is that match
is full text query and term
is term level query. The thing is that in case of full text search your query string is analyzed in the same way as the field you are querying was analyzed. In term level queries query string is not analyzed. It's important to note.
现在让我们看看 ElasticSearch如何分析
。为此,我们可以添加以下简单文档: OBJ_DELETED
Now let's see how "OBJ_DELETED"
is analyzed by ElasticSearch
. For that we can add simple document like this:
curl -X PUT 'localhost:9200/testdata/object/1' -H 'Content-Type: application/json' -d '{ "object_status": "OBJ_DELETED" }'
然后检查所有内容是否存在:
Then check that everything is there:
curl -X POST 'localhost:9200/testdata/_search?pretty'
应产生以下内容:
...
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [
{
"_index" : "testdata",
"_type" : "object",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"object_status" : "OBJ_DELETED"
}
}
]
}
现在我们可以检查 OBJ_DELETED
分析:
Now we can check how "OBJ_DELETED"
is analyzed:
curl -X POST 'localhost:9200/testdata/_analyze?pretty' -H 'Content-Type: application/json' -d '{ "text": "OBJ_DELETED" }'
并输出:
{
"tokens" : [
{
"token" : "obj_deleted",
"start_offset" : 0,
"end_offset" : 11,
"type" : "<ALPHANUM>",
"position" : 0
}
]
}
如您所见,它仅将文本转换为小写并保存为一个标记。这就是默认分析器的工作方式。现在返回您的查询。 match
查询有效,因为查询值 OBJ_DELETED
也被转换为小写形式,因此 ElasticSearch
可以找到它。对于 term
查询,查询字符串未处理,因此实际上您正在将 OBJ_DELETED
与 obj_deleted比较
显然没有任何结果。
As you can see it only converted text into lowercase and saved it as one token. This is how default analyzer does it. Now returning to your queries. match
query works because query value "OBJ_DELETED"
is also converted to lowercase under the hood and thus ElasticSearch
can find it. And for term
query the query string is not processed so actually you are comparing OBJ_DELETED
with obj_deleted
and obviously you get no results.
最后一个问题:为什么 object_status.keyword
可以用于 term
查询吗?
And last question: why object_status.keyword
works for term
query?
默认情况下 ElasticSearch
创建每个文本字段的其他映射。您可以使用这种元数据。此外,它还允许您以不同的方式处理相同的值。因此,默认情况下,每个文本字段均具有名称附加的映射关键字
,其类型为关键字。 关键字
字段不会进行分析(只有在需要时才可以对其进行规范化处理)。这意味着对于默认映射,它将保存您传递给 ElasticSearch
的确切值(在您的情况下为 OBJ_DELETED
)。
By default ElasticSearch
create additional mapping for each text field. It's kind of metadata that you can use. Also it allows you to process the same value in different ways. So by default each text field has additional mapping with name keyword
which has type keyword. keyword
fields are not analyzed (they only can be normalyzed if needed). It means that for default mapping it saves the exact value that you passes to ElasticSearch
(OBJ_DELETED
in your case).
这篇关于匹配有效时,简单字词查询无法与Elastic一起使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!