使用嵌套和根级别数据的Elasticsearch嵌套聚合比率 [英] Elasticsearch nested aggregation ratios using nested and root level data

查看:155
本文介绍了使用嵌套和根级别数据的Elasticsearch嵌套聚合比率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个简单的聚合感觉。我有一个时序代码如下的文档:

I have what feels like a simple aggregation. I have a document that's timing code like so:

{
  "task_start": "2020-06-03T21:19:07.908821Z",
  "task_end": "2020-06-03T21:27:00.323790Z",
  "sub_tasks": [
    {
      "key": "sub-task1-time-milliseconds",
      "value": 3310
    },
    {
      "key": "sub-task2-time-milliseconds",
      "value": 2410
    },
    ...
  ]
}

其中嵌套了sub_tasks。我想要得到的是每个子任务中的时间与整个任务时间的中位数比率。整个任务时间仅为 end_time-start_time 。我知道如何分别汇总子任务时间和总任务时间。但我想汇总每个文档的比率。

where sub_tasks is nested. What I'd like to get is the median ratio of time in each sub task to the entire task time. The entire task time would be just end_time - start_time. I know how to aggregate the median sub task time and total task time individually. But I'd like to aggregate the ratio per document.

问题出在嵌套聚合中,我只能访问带有嵌套数据的数据,而在反向嵌套聚合中,我只能访问根级别的数据,但从不访问两者一起。我了解有一种方法可以 copy_to 这样我的任务时间在嵌套路径中,但是我无法修改索引结构,也不希望有额外的存储空间。

The issue is on a nested aggregation I can only access data with the nested data, and within a reverse nested aggregation I can only access data at the root level, but never both together. I understand there's a way to copy_to so I have the task times in the nested path, but I don't have the ability to modify the indexing structure, and wouldn't want the extra storage either.

我尝试过的对于嵌套聚合:

Here's what I've tried. For a nested aggregation:

{
  "aggs": {
    "task_metrics": {
      "nested": {
        "path": "sub_tasks"
      },
      "aggs": {
        "sub_task_metrics": {
          "filter": {
            "term": {
              "sub_tasks.key": "sub-task1-time-milliseconds"
            }
          },
          "aggs": {
            "median_time": {
              "percentiles": {
                "script": {
                  "lang": "painless",
                  "source": """
                            double task_time = (doc['task_end'].value.millis - doc['task_start'].value.millis);
                            return doc['sub_tasks.value'].value / task_time; 
                            """
                },
                "percents": 50
              }
            }
          }
        }
      }
    }
  }
}

但是在这种聚合中, doc ['task_start'] doc ['task_end'] 仅返回零,因为我无权访问它们。为了获得访问权限,我还尝试了 reverse_nested 来添加另一个子聚合。这使我可以访问 doc ['task_start'] doc ['task_end'] ,但随后是 doc ['sub_tasks.value']。value 仅返回 0

But in that aggregation doc['task_start'] and doc['task_end'] just return zero because I don't have access to them. To get access, I also tried a reverse_nested that adds another sub aggregation. This gets me access to doc['task_start'] and doc['task_end'], but then doc['sub_tasks.value'].value just returns 0.

应该像这样感觉是可能的,但是当我阅读管道聚合和其他脚本聚合时,我不相信其中任何一个都能满足我的要求。非常感谢您的帮助,谢谢!

It just feels like this should be possible, but when I read over pipeline aggregations and other script aggregations, I don't believe any of those do what I want. Greatly appreciate any help, thank you!

推荐答案

这很棘手-已经讨论过此处

This one's tricky -- already discussed here.

我认为您必须使用 scripted_metric s和一些方法嘲笑,因为暴露的无痛API是有些限制

I think you'll have to resort a bit of scripted_metrics and some method mocking because the exposed painless API is somewhat limited:

{
  "size": 0, 
  "aggs": {
    "task_metrics_median": {
      "scripted_metric": {
        "init_script": "state.ratios = new ArrayList();",

        "map_script": """
          // access the source incl. the nested subtasks
          def d = params._source;

          for (def subtask : d.sub_tasks) {
            // mimicking a `term` query
            if (subtask.key != 'sub-task1-time-milliseconds') break;

            // incoming as strings so parse
            def millis_end = ZonedDateTime.parse(d.task_end).toInstant().toEpochMilli();
            def millis_start = ZonedDateTime.parse(d.task_start).toInstant().toEpochMilli();

            double task_time = (millis_end - millis_start);

            // prevent zero division
            if (task_time <= 0) break;

            state['ratios'].add(subtask.value / task_time);  
          }
        """,

        "combine_script": """
            def ratios = state.ratios;
            Collections.sort(ratios);

            // trivial median calc
            double median;
            if (ratios.length % 2 == 0) {
                median = ((double)ratios[ratios.length/2] + (double)ratios[ratios.length/2 - 1])/2;
            } else {
                median = (double) ratios[ratios.length/2];
            }

            return median
        """,

        "reduce_script": "return states"
      }
    }
  }
}

这篇关于使用嵌套和根级别数据的Elasticsearch嵌套聚合比率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆