弹性搜索嵌套过滤器返回空结果 [英] elasticsearch nested filter return empty result
问题描述
我有这个映射:
"post": {
"model": "Post",
"properties": {
"id": {
"type": "integer"
},
"title": {
"type": "string",
"analyzer": "custom_analyzer",
"boost": 5
},
"description": {
"type": "string",
"analyzer": "custom_analyzer",
"boost": 4
},
"condition": {
"type": "integer",
"index": "not_analyzed"
},
"categories": {
"type": "string",
"index": "not_analyzed"
},
"seller": {
"type": "nested",
"properties": {
"id": {
"type": "integer",
"index": "not_analyzed"
},
"username": {
"type": "string",
"analyzer": "custom_analyzer",
"boost": 1
},
"firstName": {
"type": "string",
"analyzer": "custom_analyzer",
"boost": 3
},
"lastName": {
"type": "string",
"analyzer": "custom_analyzer",
"boost": 2
}
}
},
"marketPrice": {
"type": "float",
"index": "not_analyzed"
},
"currentPrice": {
"type": "float",
"index": "not_analyzed"
},
"discount": {
"type": "float",
"index": "not_analyzed"
},
"commentsCount": {
"type": "integer",
"index": "not_analyzed"
},
"likesCount": {
"type": "integer",
"index": "not_analyzed"
},
"featured": {
"type": "boolean",
"index": "not_analyzed"
},
"bumped": {
"type": "boolean",
"index": "not_analyzed"
},
"created": {
"type": "date",
"index": "not_analyzed"
},
"modified": {
"type": "date",
"index": "not_analyzed"
}
}
}
这个查询:
GET /develop/_search?search_type=dfs_query_then_fetch
{
"query": {
"filtered" : {
"query": {
"bool": {
"must": [
{ "match": { "title": "post" }}
]
}
},
"filter": {
"bool": {
"must": [
{"term": {
"featured": 0
}},
{
"nested": {
"path": "seller",
"filter": {
"bool": {
"must": [
{ "term": { "seller.firstName": "Test 3" } }
]
}
},
"_cache" : true
}}
]
}
}
}
},
"sort": [
{
"_score":{
"order": "desc"
}
},{
"created": {
"order": "desc"
}
}
],
"track_scores": true
}
我等待25个结果,因为我有25个帖子索引。但我得到一个空集。如果我删除嵌套的过滤器,所有的工作都很好。我想要能够过滤嵌套对象
I wait 25 results because i have 25 post indexed. But i get an empty set. If i remove the nested filter all work just fine. I want to be able to filter for the nested object
在我的设置中我有: / p>
In my settings i have:
"analyzer": {
"custom_analyzer": {
"type": "custom",
"tokenizer": "nGram",
"filter": [
"stopwords",
"asciifolding",
"lowercase",
"snowball",
"english_stemmer",
"english_possessive_stemmer",
"worddelimiter"
]
},
"custom_search_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"stopwords",
"asciifolding",
"lowercase",
"snowball",
"english_stemmer",
"english_possessive_stemmer",
"worddelimiter"
]
}
}
这里缺少什么。
谢谢
推荐答案
简短版本尝试此操作(更新端点和索引名称):
Short version: try this (after updating endpoint and index name):
curl -XPOST "http://localhost:9200/my_index/_search?search_type=dfs_query_then_fetch" -d'
{
"query": {
"filtered": {
"query": {
"bool": {
"must": [
{
"match": {
"title": "post"
}
}
]
}
},
"filter": {
"bool": {
"must": [
{
"nested": {
"path": "seller",
"filter": {
"bool": {
"must": [
{
"terms": {
"seller.firstName": [
"test",
"3"
],
"execution": "and"
}
}
]
}
}
}
}
]
}
}
}
}
}'
它为我工作,您的设置的简化版本。我会在一段时间内发表一个更长的解释的修改。
It worked for me, with a simplified version of your setup. I'll post an an edit with a longer explanation in a little while.
编辑:长版本:
您的查询的问题是分析器与查询中的术语
过滤器相结合。您的分析器将 firstName
字段的文本打破到令牌中;所以Test 3
成为令牌test
和3
。当您使用 {term:{seller.firstName:Test 3}}
您所说的是找到一个文件,其中一个令牌为seller.firstName
是测试3
,没有任何文档为真(事实上,不能给出分析仪的设置方式)。您可以在该字段上使用index:not_analyzed
,然后您的查询将工作,或者您可以使用条款
过滤器像我上面显示的。以下是我到达的地方:
The problem with your query is the analyzer combined with the term
filter in your query. Your analyzer is breaking the text of the firstName
field into tokens; so "Test 3"
becomes the tokens "test"
and "3"
. When you use { "term": { "seller.firstName": "Test 3" } }
what you're saying is, find a document where one of the tokens for "seller.firstName"
is "Test 3"
, and there aren't any documents for which that is true (in fact, there can't be given the way your analyzer is set up). You could use "index": "not_analyzed"
on that field and then your query would work, or you can use a terms
filter like I showed above. Here's how I got there:
我从您的评论中链接到的索引定义开始,简化了一点,使其更易于阅读,仍然保持基本问题:
I started with the index definition you linked to in your comment, and simplified it a little to make it more readable and still maintain the essential issue:
curl -XDELETE "http://localhost:9200/my_index"
curl -XPUT "http://localhost:9200/my_index" -d'
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0,
"analysis": {
"filter": {
"snowball": { "type": "snowball", "language": "English" },
"english_stemmer": { "type": "stemmer", "language": "english" },
"english_possessive_stemmer": { "type": "stemmer", "language": "possessive_english" },
"stopwords": { "type": "stop", "stopwords": [ "_english_" ] },
"worddelimiter": { "type": "word_delimiter" }
},
"tokenizer": {
"nGram": { "type": "nGram", "min_gram": 3, "max_gram": 20 }
},
"analyzer": {
"custom_analyzer": {
"type": "custom",
"tokenizer": "nGram",
"filter": [
"stopwords",
"asciifolding",
"lowercase",
"snowball",
"english_stemmer",
"english_possessive_stemmer",
"worddelimiter"
]
},
"custom_search_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"stopwords",
"asciifolding",
"lowercase",
"snowball",
"english_stemmer",
"english_possessive_stemmer",
"worddelimiter"
]
}
}
}
},
"mappings": {
"posts": {
"properties": {
"title": {
"type": "string",
"analyzer": "custom_analyzer",
"boost": 5
},
"seller": {
"type": "nested",
"properties": {
"firstName": {
"type": "string",
"analyzer": "custom_analyzer",
"boost": 3
}
}
}
}
}
}
}'
然后我添加了一些测试文档:
Then I added a few test docs:
curl -XPUT "http://localhost:9200/my_index/posts/1" -d'
{"title": "post", "seller": {"firstName":"Test 1"}}'
curl -XPUT "http://localhost:9200/my_index/posts/2" -d'
{"title": "post", "seller": {"firstName":"Test 2"}}'
curl -XPUT "http://localhost:9200/my_index/posts/3" -d'
{"title": "post", "seller": {"firstName":"Test 3"}}'
然后运行一个简化版本的基本结构的查询仍然完整,但使用术语
过滤器而不是术语
过滤器:
Then ran a simplified version of your query with the basic structure still intact, but with a terms
filter instead of a term
filter:
curl -XPOST "http://localhost:9200/my_index/_search?search_type=dfs_query_then_fetch" -d'
{
"query": {
"filtered": {
"query": {
"bool": {
"must": [
{
"match": {
"title": "post"
}
}
]
}
},
"filter": {
"bool": {
"must": [
{
"nested": {
"path": "seller",
"filter": {
"bool": {
"must": [
{
"terms": {
"seller.firstName": [
"test",
"3"
],
"execution": "and"
}
}
]
}
}
}
}
]
}
}
}
}
}'
...
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 6.085842,
"hits": [
{
"_index": "my_index",
"_type": "posts",
"_id": "3",
"_score": 6.085842,
"_source": {
"title": "post",
"seller": {
"firstName": "Test 3"
}
}
}
]
}
}
这似乎返回你想要的
这是我使用的代码:
http://sense.qbox.io/gist/041dd929106d27ea606f48ce1f86076c52faec91
这篇关于弹性搜索嵌套过滤器返回空结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!