在Elasticsearch中按嵌套字段值的总和查询文档 [英] Query documents by sum of nested field values in elasticsearch

查看:79
本文介绍了在Elasticsearch中按嵌套字段值的总和查询文档的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此处是ElasticSearch的初学者.

Rank beginner at ElasticSearch here.

我有一个客户列表,他们的订单是一个嵌套字段.假设文档结构如下:

I have a list of customers, and their orders as a nested field. Assuming a document structure like:

[
  { customerId: 123,
    birthday: 1980-01-01,
    orders: [
      {
        orderValue: 1500,
        orderDate: 2018-12-18T12:18:12Z
      },
      [...]
    ]
  },
  [...]
}

我要查询的是:从两个日期之间订购了一定数量的用户的列表.而且我希望能够将其与范围查询结合起来,例如生日.

What I'd like to query is: The list of users who ordered for a certain amount from between two dates. And I'd like to be able to combine that with a range query for, for example, birthday.

我已经到了可以通过聚合获取每个订户两个日期之间的和的顺序:

I've gotten to the point where I can get the sum ordered between two dates per subscriber using aggregations:

{
  "size": 0,
  "aggs": {
    "foo": {
      "nested": {
        "path": "orders"
      },
      "aggs": {
        "grouped_by_customerId": {
          "terms": {
            "field": "orders.customerId.keyword"
          },
          "aggs": {
            "filtered_by_date": {
              "filter": {
                "range": {
                  "orders.orderDate": {
                    "from": "2018-01-28",
                    "to": null,
                    "include_lower": false,
                    "include_upper": true,
                    "format": "yyyy-MM-dd",
                    "boost": 1
                  }
                }
              },
              "aggs": {
                "sum": {
                  "sum": {
                    "field": "orders.orderValue"
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

但是,我想限制我在查询"部分返回的结果,以便与我们所有其他过滤器更好地混合.

However, I'd like to limit the results I get back in the Query part, to mix better with all our other filters.

我的第一个想法是拥有一个脚本过滤器,并将边界日期和最小值作为参数传递,但是随后我不得不遍历文档的嵌套文档,这似乎行不通.

My first thought was to have a script filter and pass the bounding dates and minimum value in as parameters, but then I'd have to iterate over a doc's nested documents, and that doesn't seem to work.

最后一个想法可能吗?如果可以,怎么办?

Is that last idea possible, and if so, how?

谢谢!

推荐答案

使用

这产生的实际分数是调试输出以测试操作员,但是min_score为1意味着它们中的任何一个都匹配.使用_source非常慢.

The actual scores this produces are debugging output to test the operator, but a min_score of 1 means any of them match. Using _source is quite slow.

在function_score中没有查询的情况下,它可以工作,但是要花300秒左右才能遍历300万条记录.通过查询,您只会查看订单与日期范围实际匹配的客户.

Without a query in the function_score it works, but takes 20 seconds or so to muddle through 3 million records. With the query, you only look at customers with orders that actually match the date range.

由于无痛脚本处理了整个订单列表,因此必须重做日期数学.为此可以进行一些优化,但是至少我有一个概念证明.

Since the painless script processes the entire list of orders, it has to redo the date math. Some optimization to do there, but at least I have a Proof of Concept.

我以前看过这个问题并没有满意的答案,所以希望有人觉得这很有用.

I've seen this question before without a satisfactory answer, so hopefully someone finds this useful.

这篇关于在Elasticsearch中按嵌套字段值的总和查询文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆