如何通过Elasticsearch中的嵌套字段计算多个唯一文档？ [英] How to count a number of unique documents by a nested field in Elasticsearch?

查看：369 发布时间：2020/6/7 19:19:27 java elasticsearch elasticsearch-aggregation elasticsearch-query cardinality

本文介绍了如何通过Elasticsearch中的嵌套字段计算多个唯一文档？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试对具有唯一嵌套字段值的文档进行计数（接下来，文档本身也是如此）。看起来获得唯一文档有效。
但是，当我尝试执行 count 的请求时，出现如下错误：

I'm trying to count documents with unique nested field value (and next, the documents itself also). Looks like getting the unique documents works. But when I'm trying to execute a request for count, I'm getting an error as follows:

抑制：org.elasticsearch.client.ResponseException：方法[POST]，主机[ http ：// localhost：9200] ，URI [/ package / _count？ignore_throttled = true& ignore_unavailable = false& expand_wildcards = open& allow_no_indices = true]，状态行[HTTP / 1.1 400错误的请求]
{ error：{ root_cause：[{ type： parsing_exception， reason：请求不支持[collapse]， line：1， col：216}]， type ： parsing_exception，原因：请求不支持[collapse]， line：1， col：216}， status：400}

Suppressed: org.elasticsearch.client.ResponseException: method [POST], host [http://localhost:9200], URI [/package/_count?ignore_throttled=true&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true], status line [HTTP/1.1 400 Bad Request] {"error":{"root_cause":[{"type":"parsing_exception","reason":"request does not support [collapse]","line":1,"col":216}],"type":"parsing_exception","reason":"request does not support [collapse]","line":1,"col":216},"status":400}

代码：

        BoolQueryBuilder innerTemplNestedBuilder = QueryBuilders.boolQuery();
        NestedQueryBuilder templatesNestedQuery = QueryBuilders.nestedQuery("attachment", innerTemplNestedBuilder, ScoreMode.None);
        BoolQueryBuilder mainQueryBuilder = QueryBuilders.boolQuery().must(templatesNestedQuery);
        if (!isEmpty(templateName)) {
            innerTemplNestedBuilder.filter(QueryBuilders.termQuery("attachment.name", templateName));
        }
        SearchSourceBuilder searchSourceBuilder = SearchSourceBuilder.searchSource()
                    .collapse(new CollapseBuilder("attachment.uuid"))
                    .query(mainQueryBuilder);
    // NEXT LINE CAUSE ERROR
        long count = client.count(new CountRequest("package").source(searchSourceBuilder), RequestOptions.DEFAULT).getCount(); <<<<<<<<<< ERROR HERE
        // THIS WORKS 
        SearchResponse searchResponse = client.search(
                    new SearchRequest(
                            new String[] {"package"},
                            searchSourceBuilder.timeout(new TimeValue(20, TimeUnit.SECONDS)).from(offset).size(limit)
                    ).indices("package").searchType(SearchType.DFS_QUERY_THEN_FETCH),
                    RequestOptions.DEFAULT
        );
        return ....;

此方法的总体目的是获取一部分文档以及所有此类文档的数量。可能已经有另一种方法可以满足这种需求。如果我想使用聚合和基数个计数 >-我得到的结果为零，在嵌套字段上似乎无效。

The overall intention of approach is to get a portion of documents and the number of all such documents. May be there is another approach for this need already exists. If I'm trying to get count using aggregations and cardinality - I'm getting the zero result and it looks like it doesn't work on the nested fields.

计数请求：

{
    "query": {
        "bool": {
            "must": [
                {
                    "nested": {
                        "query": {
                            "bool": {
                                "adjust_pure_negative": true,
                                "boost": 1.0
                            }
                        },
                        "path": "attachment",
                        "ignore_unmapped": false,
                        "score_mode": "none",
                        "boost": 1.0
                    }
                }
            ],
            "adjust_pure_negative": true,
            "boost": 1.0
        }
    },
    "collapse": {
        "field": "attachment.uuid"
    }
}

如何创建映射：

curl -X DELETE "localhost:9200/package?pretty"
curl -X PUT    "localhost:9200/package?include_type_name=true&pretty" -H 'Content-Type: application/json' -d '{
    "settings" :  {
        "number_of_shards" : 1,
        "number_of_replicas" : 1
    }}'
curl -X PUT    "localhost:9200/package/_mappings?pretty" -H 'Content-Type: application/json' -d'
{
      "dynamic": false,
      "properties" : {
        "attachment": {
            "type": "nested",
            "properties": {
                "uuid" : { "type" : "keyword" },
                "name" : { "type" : "text" }
            }
        },
        "uuid" : {
          "type" : "keyword"
        }
      }
}
'

r由代码生成的结果查询应类似于以下内容：

result query generated by code should be something like this:

curl -X POST "localhost:9200/package/_count?&pretty" -H 'Content-Type: application/json' -d' { "query" :
    {
        "bool": {
            "must": [
                {
                    "nested": {
                        "query": {
                            "bool": {
                                "adjust_pure_negative": true,
                                "boost": 1.0
                            }
                        },
                        "path": "attachment",
                        "ignore_unmapped": false,
                        "score_mode": "none",
                        "boost": 1.0
                    }
                }
            ],
            "adjust_pure_negative": true,
            "boost": 1.0
        }
    },
    "collapse": {
        "field": "attachment.uuid"
    }
}'

要获得唯一嵌套字段的父文档数，我们将不得不变得更加聪明：

To get the the parent doc count of unique nested fields, we're gonna have to get slightly more clever:

GET package/_search
{
  "size": 0,
  "aggs": {
    "nested_uniques": {
      "nested": {
        "path": "attachment"
      },
      "aggs": {
        "scripted_uniques": {
          "scripted_metric": {
            "init_script": "state.my_map = [:];",
            "map_script": """
              if (doc.containsKey('attachment.uuid')) {
                state.my_map[doc['attachment.uuid'].value.toString()] = 1;
              }
            """,
            "combine_script": """
              def sum = 0;
              for (c in state.my_map.entrySet()) {
                sum += 1
              }
              return sum
            """,
            "reduce_script": """
              def sum = 0;
              for (agg in states) {
                sum += agg;
              }
              return sum;
            """
          }
        }
      }
    }
  }
}

...
{
  "aggregations":{
    "nested_uniques":{
      "doc_count":3,
      "scripted_uniques":{
        "value":2
      }
    }
  }
}

$ c> scripted_uniques：2 正是您所追求的。

and this scripted_uniques: 2 is exactly what you're after.

注意：我使用嵌套的脚本指标aggs解决了该用例，但是如果您知道有一种更清洁的方法，我非常乐于学习它！

这篇关于如何通过Elasticsearch中的嵌套字段计算多个唯一文档？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何通过Elasticsearch中的嵌套字段计算多个唯一文档？ [英] How to count a number of unique documents by a nested field in Elasticsearch?

问题描述

推荐答案

要获得唯一嵌套字段的父文档数，我们将不得不变得更加聪明：

To get the the parent doc count of unique nested fields, we're gonna have to get slightly more clever:

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

如何通过Elasticsearch中的嵌套字段计算多个唯一文档？ [英] How to count a number of unique documents by a nested field in Elasticsearch?

问题描述

推荐答案

要获得唯一嵌套字段的父文档数，我们将不得不变得更加聪明：

To get the the parent doc count of unique nested fields, we're gonna have to get slightly more clever:

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭