Elastic Search按嵌套文档计数过滤 [英] Elastic Search filter by count of a nested document

查看:103
本文介绍了Elastic Search按嵌套文档计数过滤的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个针对公司的弹性搜索索引,该索引具有一个嵌套的对象,称为 transactions 。交易中至少有一个日期字段。下面是一个示例:

I have an elastic search index for firms, that has a nested object called transactions. The transactions have at least a date field. Here is a sample:

firms: [
  {
    "name": "abc",
    "address" : "xyz",
    "transactions": [
       {
         "date" : "2014-12-20"
         "side" : "buyer"
       },
       ...
     ]
  },
  ...
]

鉴于此数据,我想查询所有在过去6或12个月内进行(例如)3次以上交易的公司。

Given this data, I want to query for all firms having (say) 3+ transactions in the past 6 or 12 months.

以下查询返回在过去12个月中至少有一笔交易的公司:

The following query returns firms having at least one transaction in the past 12 months:

POST firms/firm/_search
    {
    "query": {
        "nested": {
           "path": "transactions",
           "query": {
               "bool": {
                   "must": [
                      {
                          "match": {
                             "transactions.side": "buyer"
                          }
                      },
                      {
                          "range": {
                             "transactions.date": {
                                "from": "2014-10-24",
                                "to": "2015-10-24"
                             }
                          }
                      }
                   ]
               }
           }
        }  
    }
}

我不确定如何扩展该查询以匹配在y +个月内进行x +次交易的公司。任何帮助将不胜感激。谢谢

I'm not sure how would I extend this query to match firms having x+ transactions in a period of y+ months. Any help will be appreciated. Thanks

推荐答案

除了使用脚本。像这样的东西:

I don't think you have other option than using a script. Something like this:

{
  "query": {
    "bool": {
      "must": [
        {
          "nested": {
            "path": "transactions",
            "query": {
              "bool": {
                "must": [
                  {
                    "match": {
                      "transactions.side": "buyer"
                    }
                  },
                  {
                    "range": {
                      "transactions.date": {
                        "from": "2014-10-24",
                        "to": "2015-10-24"
                      }
                    }
                  }
                ]
              }
            }
          }
        },
        {
          "filtered": {
            "filter": {
              "script": {
                "script": "if(_source.transactions.size<3) return false;fromDate=Date.parse('yyyy-MM-dd',fromDateParam);toDate=Date.parse('yyyy-MM-dd',toDateParam);count=0;for(d in _source.transactions){docsDate=Date.parse('yyyy-MM-dd',d.get('date'));if(docsDate>=fromDate && docsDate<=toDate){count++};if(count==3){return true;}};return false;",
                "params": {
                  "fromDateParam":"2014-10-24",
                  "toDateParam":"2015-10-24"
                }
              }
            }
          }
        }
      ]
    }
  }
}

实际范围过滤器是对那些没有日期匹配的文档的优化,因此,此文档(范围内没有日期)将无法达到成本更高的脚本过滤器。

The actual range filter is an "optimization" for those documents where none of the dates matches. So that, this document (with no dates in the range) will not reach the more costly script filter.

脚本本身首先检查交易次数是否少于 3 。如果是,则不要费心进行所有日期检查并返回 false 。如果大于 3 然后取每个日期并与参数进行比较。一旦达到 3 的计数,就停止查看其余部分日期并返回 true

The script itself first checks if the number of transactions is less than 3. If it is, don't bother doing all the date checks and return false. If it's more than 3 then take each date and compare with the parameters. As soon as a count of 3 is reached stop looking at the rest of the dates and return true.

这篇关于Elastic Search按嵌套文档计数过滤的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆