弹性搜索嵌套过滤器返回空结果 [英] elasticsearch nested filter return empty result

查看:134
本文介绍了弹性搜索嵌套过滤器返回空结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这个映射:

  "post": {
    "model": "Post",
    "properties": {
      "id": {
        "type": "integer"
      },
      "title": {
        "type": "string",
        "analyzer": "custom_analyzer",
        "boost": 5
      },
      "description": {
        "type": "string",
        "analyzer": "custom_analyzer",
        "boost": 4
      },
      "condition": {
        "type": "integer",
        "index": "not_analyzed"
      },
      "categories": {
        "type": "string",
        "index": "not_analyzed"
      },
      "seller": {
        "type": "nested",
        "properties": {
          "id": {
            "type": "integer",
            "index": "not_analyzed"
          },
          "username": {
            "type": "string",
            "analyzer": "custom_analyzer",
            "boost": 1
          },
          "firstName": {
            "type": "string",
            "analyzer": "custom_analyzer",
            "boost": 3
          },
          "lastName": {
            "type": "string",
            "analyzer": "custom_analyzer",
            "boost": 2
          }
        }
      },
      "marketPrice": {
        "type": "float",
        "index": "not_analyzed"
      },
      "currentPrice": {
        "type": "float",
        "index": "not_analyzed"
      },
      "discount": {
        "type": "float",
        "index": "not_analyzed"
      },
      "commentsCount": {
        "type": "integer",
        "index": "not_analyzed"
      },
      "likesCount": {
        "type": "integer",
        "index": "not_analyzed"
      },
      "featured": {
        "type": "boolean",
        "index": "not_analyzed"
      },
      "bumped": {
        "type": "boolean",
        "index": "not_analyzed"
      },
      "created": {
        "type": "date",
        "index": "not_analyzed"
      },
      "modified": {
        "type": "date",
        "index": "not_analyzed"
      }
    }
  }

这个查询:

GET /develop/_search?search_type=dfs_query_then_fetch
{
  "query": {
    "filtered" : {
        "query": {
          "bool": {
            "must": [
              { "match": { "title": "post" }}
            ]
          }
        },
        "filter": {
          "bool": { 
            "must": [
              {"term": {
                "featured": 0
              }},
              { 
              "nested": {
                "path": "seller",
                "filter": {
                  "bool": {
                    "must": [
                      { "term": { "seller.firstName": "Test 3" } }
                    ]
                  }
                },
                "_cache" : true
              }}
            ]
          } 
        }
    }
  },
  "sort": [
    {
      "_score":{
        "order": "desc"
      }
    },{
      "created": {
        "order": "desc"
      }
    }
  ],
  "track_scores": true
}

我等待25个结果,因为我有25个帖子索引。但我得到一个空集。如果我删除嵌套的过滤器,所有的工作都很好。我想要能够过滤嵌套对象

I wait 25 results because i have 25 post indexed. But i get an empty set. If i remove the nested filter all work just fine. I want to be able to filter for the nested object

在我的设置中我有: / p>

In my settings i have:

    "analyzer": {
      "custom_analyzer": {
        "type": "custom",
        "tokenizer": "nGram",
        "filter": [
          "stopwords",
          "asciifolding",
          "lowercase",
          "snowball",
          "english_stemmer",
          "english_possessive_stemmer",
          "worddelimiter"
        ]
      },
      "custom_search_analyzer": {
        "type": "custom",
        "tokenizer": "standard",
        "filter": [
          "stopwords",
          "asciifolding",
          "lowercase",
          "snowball",
          "english_stemmer",
          "english_possessive_stemmer",
          "worddelimiter"
        ]
      }
    }

这里缺少什么。

谢谢

推荐答案

简短版本尝试此操作(更新端点和索引名称):

Short version: try this (after updating endpoint and index name):

curl -XPOST "http://localhost:9200/my_index/_search?search_type=dfs_query_then_fetch" -d'
{
   "query": {
      "filtered": {
         "query": {
            "bool": {
               "must": [
                  {
                     "match": {
                        "title": "post"
                     }
                  }
               ]
            }
         },
         "filter": {
            "bool": {
               "must": [
                  {
                     "nested": {
                        "path": "seller",
                        "filter": {
                           "bool": {
                              "must": [
                                 {
                                    "terms": {
                                       "seller.firstName": [
                                          "test",
                                          "3"
                                       ],
                                       "execution": "and"
                                    }
                                 }
                              ]
                           }
                        }
                     }
                  }
               ]
            }
         }
      }
   }
}'

它为我工作,您的设置的简化版本。我会在一段时间内发表一个更长的解释的修改。

It worked for me, with a simplified version of your setup. I'll post an an edit with a longer explanation in a little while.

编辑:长版本:

您的查询的问题是分析器与查询中的术语过滤器相结合。您的分析器将 firstName 字段的文本打破到令牌中;所以Test 3成为令牌test3。当您使用 {term:{seller.firstName:Test 3}} 您所说的是找到一个文件,其中一个令牌为seller.firstName测试3,没有任何文档为真(事实上,不能给出分析仪的设置方式)。您可以在该字段上使用index:not_analyzed,然后您的查询将工作,或者您可以使用条款过滤器像我上面显示的。以下是我到达的地方:

The problem with your query is the analyzer combined with the term filter in your query. Your analyzer is breaking the text of the firstName field into tokens; so "Test 3" becomes the tokens "test" and "3". When you use { "term": { "seller.firstName": "Test 3" } } what you're saying is, find a document where one of the tokens for "seller.firstName" is "Test 3", and there aren't any documents for which that is true (in fact, there can't be given the way your analyzer is set up). You could use "index": "not_analyzed" on that field and then your query would work, or you can use a terms filter like I showed above. Here's how I got there:

我从您的评论中链接到的索引定义开始,简化了一点,使其更易于阅读,仍然保持基本问题:

I started with the index definition you linked to in your comment, and simplified it a little to make it more readable and still maintain the essential issue:

curl -XDELETE "http://localhost:9200/my_index"

curl -XPUT "http://localhost:9200/my_index" -d'
{
   "settings": {
      "number_of_shards": 1,
      "number_of_replicas": 0,
      "analysis": {
         "filter": {
            "snowball": { "type": "snowball", "language": "English" },
            "english_stemmer": { "type": "stemmer", "language": "english" },
            "english_possessive_stemmer": { "type": "stemmer", "language": "possessive_english" },
            "stopwords": { "type": "stop",  "stopwords": [ "_english_" ] },
            "worddelimiter": { "type": "word_delimiter" }
         },
         "tokenizer": {
            "nGram": { "type": "nGram", "min_gram": 3, "max_gram": 20 }
         },
         "analyzer": {
            "custom_analyzer": {
               "type": "custom",
               "tokenizer": "nGram",
               "filter": [
                  "stopwords",
                  "asciifolding",
                  "lowercase",
                  "snowball",
                  "english_stemmer",
                  "english_possessive_stemmer",
                  "worddelimiter"
               ]
            },
            "custom_search_analyzer": {
               "type": "custom",
               "tokenizer": "standard",
               "filter": [
                  "stopwords",
                  "asciifolding",
                  "lowercase",
                  "snowball",
                  "english_stemmer",
                  "english_possessive_stemmer",
                  "worddelimiter"
               ]
            }
         }
      }
   },
   "mappings": {
      "posts": {
         "properties": {
            "title": {
               "type": "string",
               "analyzer": "custom_analyzer",
               "boost": 5
            },
            "seller": {
               "type": "nested",
               "properties": {
                  "firstName": {
                     "type": "string",
                     "analyzer": "custom_analyzer",
                     "boost": 3
                  }
               }
            }
         }
      }
   }
}'

然后我添加了一些测试文档:

Then I added a few test docs:

curl -XPUT "http://localhost:9200/my_index/posts/1" -d'
{"title": "post", "seller": {"firstName":"Test 1"}}'
curl -XPUT "http://localhost:9200/my_index/posts/2" -d'
{"title": "post", "seller": {"firstName":"Test 2"}}'
curl -XPUT "http://localhost:9200/my_index/posts/3" -d'
{"title": "post", "seller": {"firstName":"Test 3"}}'

然后运行一个简化版本的基本结构的查询仍然完整,但使用术语过滤器而不是术语过滤器:

Then ran a simplified version of your query with the basic structure still intact, but with a terms filter instead of a term filter:

curl -XPOST "http://localhost:9200/my_index/_search?search_type=dfs_query_then_fetch" -d'
{
   "query": {
      "filtered": {
         "query": {
            "bool": {
               "must": [
                  {
                     "match": {
                        "title": "post"
                     }
                  }
               ]
            }
         },
         "filter": {
            "bool": {
               "must": [
                  {
                     "nested": {
                        "path": "seller",
                        "filter": {
                           "bool": {
                              "must": [
                                 {
                                    "terms": {
                                       "seller.firstName": [
                                          "test",
                                          "3"
                                       ],
                                       "execution": "and"
                                    }
                                 }
                              ]
                           }
                        }
                     }
                  }
               ]
            }
         }
      }
   }
}'
...
{
   "took": 5,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 6.085842,
      "hits": [
         {
            "_index": "my_index",
            "_type": "posts",
            "_id": "3",
            "_score": 6.085842,
            "_source": {
               "title": "post",
               "seller": {
                  "firstName": "Test 3"
               }
            }
         }
      ]
   }
}

这似乎返回你想要的

这是我使用的代码:

http://sense.qbox.io/gist/041dd929106d27ea606f48ce1f86076c52faec91

这篇关于弹性搜索嵌套过滤器返回空结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆