如何按嵌套类型的数组大小过滤? [英] How to filter by the size of an array in nested type?

查看:104
本文介绍了如何按嵌套类型的数组大小过滤?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我具有以下类型:

{
    "2019-11-04": {
        "mappings": {
            "_doc": {
                "properties": {
                    "labels": {
                        "type": "nested",
                        "properties": {
                            "confidence": {
                                "type": "float"
                            },
                            "created_at": {
                                "type": "date",
                                "format": "strict_date_optional_time||date_time||epoch_millis"
                            },
                            "label": {
                                "type": "keyword"
                            },
                            "updated_at": {
                                "type": "date",
                                "format": "strict_date_optional_time||date_time||epoch_millis"
                            },
                            "value": {
                                "type": "keyword",
                                "fields": {
                                    "numeric": {
                                        "type": "float",
                                        "ignore_malformed": true
                                    }
                                }
                            }
                        }
                    },
                    "params": {
                        "type": "object"
                    },
                    "type": {
                        "type": "keyword"
                    }
                }
            }
        }
    }
}

我想按labels数组的大小/长度进行过滤.我已经尝试了以下方法(官方文档建议):

And I want to filter by the size/length of the labels array. I've tried the following (as the official docs suggest):

{
    "query": {
        "bool": {
            "filter": {
                "script": {
                    "script": {
                        "source": "doc['labels'].size > 10"
                    }
                }
            }
        }
    }
}

但我不断得到:

{
  "error": {
    "root_cause": [
      {
        "type": "script_exception",
        "reason": "runtime error",
        "script_stack": [
          "org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:81)",
          "org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:39)",
          "doc['labels'].size > 10",
          "    ^---- HERE"
        ],
        "script": "doc['labels'].size > 10",
        "lang": "painless"
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "2019-11-04",
        "node": "kk5MNRPoR4SYeQpLk2By3A",
        "reason": {
          "type": "script_exception",
          "reason": "runtime error",
          "script_stack": [
            "org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:81)",
            "org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:39)",
            "doc['labels'].size > 10",
            "    ^---- HERE"
          ],
          "script": "doc['labels'].size > 10",
          "lang": "painless",
          "caused_by": {
            "type": "illegal_argument_exception",
            "reason": "No field found for [labels] in mapping with types []"
          }
        }
      }
    ]
  },
  "status": 500
}

推荐答案

恐怕是不可能的,因为字段labels不是ES保存的字段,也不是albiet在其上创建反向索引的字段.

I'm afraid that is not something possible, because the field labels is not a field that ES saves or albiet creates an inverted index on.

Doc doc['fieldname']仅适用于创建反向索引的字段,Elasticsearch的Query DSL也仅适用于创建反向索引的字段,不幸的是,nested type不是有效的字段,反向索引是创建.

Doc doc['fieldname'] is only applicable on the fields on which inverted index is created and Elasticsearch's Query DSL too only works on fields on which inverted index gets created and unfortunately nested type is not a valid field on which inverted index is created.

话虽如此,我有以下两种方式.

Having said so, I have the below two ways of doing this.

为简单起见,我创建了示例映射,文档和两个可能对您有帮助的解决方案.

For the sake of simplicity, I've created sample mapping, documents and two possible solutions which may help you.

PUT my_sample_index
{
  "mappings": {
    "properties": {
      "myfield": {
        "type": "nested",
        "properties": {
          "label": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

样本文档:

// single field inside 'myfield'
POST my_sample_index/_doc/1
{
  "myfield": {                              
    "label": ["New York", "LA", "Austin"]   
  }
}


// two fields inside 'myfield' 
POST my_sample_index/_doc/2
{                                          
  "myfield": {                             
    "label": ["London", "Leicester", "Newcastle", "Liverpool"],
    "country": "England"
  }
}

解决方案1:使用脚本字段(在应用程序级别进行管理)

我有一个变通办法来获取所需的东西,虽然不完全正确,但可以帮助您在服务层或应用程序上进行过滤.

Solution 1: Using Script Fields (Managing at Application Level)

I have a workaround to get what you want, well not exactly but would help you filter out on your service layer or application.

POST my_sample_index/_search
{
  "_source": "*", 
  "query": {
    "bool": {
      "must": [
        {
          "match_all": {}
        }
      ]
    }
  },
  "script_fields": {
    "label_size": {
        "script": {
            "lang": "painless",
            "source": "params['_source']['labels'].size() > 1"
        }
    }
  }
}

您会注意到,作为响应,将使用truefalse值创建一个单独的字段label_size.

You would notice that in response a separate field label_size gets created with true or false value.

示例响应如下:

{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "my_sample_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "myfield" : {
            "label" : [
              "New York",
              "LA",
              "Austin"
            ]
          }
        },
        "fields" : {
          "label_size" : [              <---- Scripted Field
            false
          ]
        }
      },
      {
        "_index" : "my_sample_index",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "myfield" : {
            "country" : "England",
            "label" : [
              "London",
              "Leicester",
              "Newcastle",
              "Liverpool"
            ]
          }
        },
        "fields" : {                  <---- Scripted Field
          "label_size" : [
            true                      <---- True because it has two fields 'labels' and 'country'
          ]
        }
      }
    ]
  }
}

请注意,只有第二个文档才有意义,因为它具有两个字段,即countrylabels.但是,如果您只想将label_sizetrue一起使用,则必须在应用程序层进行管理.

Note that only second document makes sense as it has two fields i.e. country and labels. However if you only want the docs with label_size with true, that'd would have to be managed at your application layer.

创建新索引,如下所示:

PUT my_sample_index_temp
{
  "mappings": {
    "properties": {
      "myfield": {
        "type": "nested",
        "properties": {
          "label": {
            "type": "keyword"
          }
        }
      },
      "labels_size":{             <---- New Field where we'd store the size
        "type": "integer"
      }
    }
  }
}

创建以下管道:

PUT _ingest/pipeline/set_labels_size
{
  "description": "sets the value of labels size",
  "processors": [
      {
        "script": {
          "source": """
            ctx.labels_size = ctx.myfield.size();
          """
        }
      }
    ]
}

使用Reindex API从my_sample_index索引

Use Reindex API to reindex from my_sample_index index

POST _reindex
{
  "source": {
    "index": "my_sample_index"
  },
  "dest": {
    "index": "my_sample_index_temp",
    "pipeline": "set_labels_size"
  }
}

使用GET my_sample_index_temp/_search

Verify the documents in my_sample_index_temp using GET my_sample_index_temp/_search

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "my_sample_index_temp",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "labels_size" : 1,           <---- New Field Created 
          "myfield" : {
            "label" : [
              "New York",
              "LA",
              "Austin"
            ]
          }
        }
      },
      {
        "_index" : "my_sample_index_temp",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "labels_size" : 2,           <----- New Field Created
          "myfield" : {
            "country" : "England",
            "label" : [
              "London",
              "Leicester",
              "Newcastle",
              "Liverpool"
            ]
          }
        }
      }
    ]
  }
}

现在,您只需在查询中使用此字段labels_size,它的方式就更容易(更不用说高效了).

Now you can simply use this field labels_size in your query and its way easier and not to mention efficient.

希望这会有所帮助!

这篇关于如何按嵌套类型的数组大小过滤?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆