弹性的文档日期比较问题 [英] Elastic in-doc date comparison issue

查看:46
本文介绍了弹性的文档日期比较问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含数千个此类文档的弹性索引.

  {姓名:John Doe,FirstJobStartDate:2016年8月9日,FirstJobEndDate:1/4/2019,SecondJobStartDate:7/4/2019,SecondJobEndDate:8/8/2020,ThirdJobStartDate:1/9/2020,} 

除了名称&FirstJobStartDate,其他任何字段都是可选的,文档中可能存在也可能不存在.

我需要获取4个数字:

1)多少个文档具有FirstJobEndDate?很简单

  {大小":1,询问": {已过滤":{筛选": {布尔":{必须": [{存在":{"field":"FirstJobEndDate"}}]}}}}} 

现在变得复杂了:

2)多少个文档的FirstJobEndDate小于当前日期,而没有两个(SecondJobStartDate,SecondJobEndDate或ThirdJobStartDate)?

3)多少个文档具有FirstJobEndDate,还具有(SecondJobStartDate,SecondJobEndDate,ThirdJobStartDate)中的任何一个,而这些日期中的任何一个都位于FirstJobEndDate的1年内?

4)多少个文档具有FirstJobEndDate,并且还具有(SecondJobStartDate,SecondJobEndDate,ThirdJobStartDate)中的任何一个,而这些日期中没有一个在FirstJobEndDate的1年内?

我相信可以通过正确组合必须"和应该"来完成此操作,但是由于同一文档中两个日期之间的比较,因此无法获得任何明确的解决方案.

只需确认一下,所有日期都是有效的弹性日期类型字段,而不是字符串.

任何帮助将不胜感激.弹性版本:2.4

解决方案

尝试以下方法:

第二个查询:

  {大小":1询问": {布尔":{筛选": [{存在":{"field":"FirstJobEndDate"}}],禁止": [{存在":{"field":"SecondJobStartDate"}},{存在":{"field":"SecondJobEndDate"}},{存在":{"field":"ThirdJobStartDate"}}]}}} 

第三个查询:

  {大小":1询问": {布尔":{筛选": [{存在":{"field":"FirstJobEndDate"}}],"minimum_should_match":1应该": [{脚本": {脚本":"doc.SecondJobStartDate.date!= null&& doc.SecondJobStartDate.date.getMillis()-doc.FirstJobEndDate.date.getMillis()< = 31540000000"}},{脚本": {"script":"doc.SecondJobEndDate.date!= null&& doc.SecondJobEndDate.date.getMillis()-doc.FirstJobEndDate.date.getMillis()< = 31540000000"}},{脚本": {"script":"doc.ThirdJobStartDate.date!= null&& doc.ThirdJobStartDate.date.getMillis()-doc.FirstJobEndDate.date.getMillis()< = 31540000000"}}]}}} 

第四个查询:

  {大小":1询问": {布尔":{筛选": [{存在":{"field":"FirstJobEndDate"}}],禁止": [{脚本": {脚本":"doc.SecondJobStartDate.date!= null&& doc.SecondJobStartDate.date.getMillis()-doc.FirstJobEndDate.date.getMillis()< = 31540000000"}},{脚本": {"script":"doc.SecondJobEndDate.date!= null&& doc.SecondJobEndDate.date.getMillis()-doc.FirstJobEndDate.date.getMillis()< = 31540000000"}},{脚本": {"script":"doc.ThirdJobStartDate.date!= null&& doc.ThirdJobStartDate.date.getMillis()-doc.FirstJobEndDate.date.getMillis()< = 31540000000"}}]}}} 

只是一个提示:如您所见,您需要利用脚本编写,这会降低性能.由于您事先知道要比较的日期,因此应将日期差异存储在其他标量字段中,以便以后与 range 查询轻松进行比较.

I have an elastic index with thousands of such docs.

{
    Name: John Doe,
    FirstJobStartDate: 8/9/2016,
    FirstJobEndDate:1/4/2019,
    SecondJobStartDate:7/4/2019,
    SecondJobEndDate:8/8/2020,
    ThirdJobStartDate: 1/9/2020,
}

Except for Name & FirstJobStartDate, any other field is optional and may or may not be present in the doc.

I need to get 4 numbers:

1) How many docs have a FirstJobEndDate? That's easy

{
  "size":1,    
  "query": {
    "filtered": {
      "filter": {
        "bool": {
          "must": [
            {
              "exists": {
                "field": "FirstJobEndDate"
              }
            }
          ]
        }
      }
    }
  }
}

Now it gets complex:

2) How many docs have a FirstJobEndDate that is lesser than the current date and they don't have EVEN ONE of (SecondJobStartDate, SecondJobEndDate or ThirdJobStartDate)?

3) How many docs have a FirstJobEndDate, also have ANY ONE of (SecondJobStartDate, SecondJobEndDate, ThirdJobStartDate) and ANY ONE of those dates is within 1 Year of FirstJobEndDate?

4) How many docs have a FirstJobEndDate, also have ANY ONE of (SecondJobStartDate, SecondJobEndDate, ThirdJobStartDate) and NONE of those dates is within 1 Year of FirstJobEndDate?

I believe this can be done with a correct mix of 'must' and 'should', but can't get any clear solution because of the comparison between two dates within the same document.

Just to confirm, all the dates are valid elastic date type fields and not strings.

Any help would be greatly appreciated. Elastic version: 2.4

解决方案

Try these:

For the second query:

{
  "size": 1,
  "query": {
    "bool": {
      "filter": [
        {
          "exists": {
            "field": "FirstJobEndDate"
          }
        }
      ],
      "must_not": [
        {
          "exists": {
            "field": "SecondJobStartDate"
          }
        },
        {
          "exists": {
            "field": "SecondJobEndDate"
          }
        },
        {
          "exists": {
            "field": "ThirdJobStartDate"
          }
        }
      ]
    }
  }
}

For the third query:

{
  "size": 1,
  "query": {
    "bool": {
      "filter": [
        {
          "exists": {
            "field": "FirstJobEndDate"
          }
        }
      ],
      "minimum_should_match": 1,
      "should": [
        {
          "script": {
            "script": "doc.SecondJobStartDate.date != null && doc.SecondJobStartDate.date.getMillis() - doc.FirstJobEndDate.date.getMillis() <= 31540000000"
          }
        },
        {
          "script": {
            "script": "doc.SecondJobEndDate.date != null && doc.SecondJobEndDate.date.getMillis() - doc.FirstJobEndDate.date.getMillis() <= 31540000000"
          }
        },
        {
          "script": {
            "script": "doc.ThirdJobStartDate.date != null && doc.ThirdJobStartDate.date.getMillis() - doc.FirstJobEndDate.date.getMillis() <= 31540000000"
          }
        }
      ]
    }
  }
}

For the fourth query:

{
  "size": 1,
  "query": {
    "bool": {
      "filter": [
        {
          "exists": {
            "field": "FirstJobEndDate"
          }
        }
      ],
      "must_not": [
        {
          "script": {
            "script": "doc.SecondJobStartDate.date != null && doc.SecondJobStartDate.date.getMillis() - doc.FirstJobEndDate.date.getMillis() <= 31540000000"
          }
        },
        {
          "script": {
            "script": "doc.SecondJobEndDate.date != null && doc.SecondJobEndDate.date.getMillis() - doc.FirstJobEndDate.date.getMillis() <= 31540000000"
          }
        },
        {
          "script": {
            "script": "doc.ThirdJobStartDate.date != null && doc.ThirdJobStartDate.date.getMillis() - doc.FirstJobEndDate.date.getMillis() <= 31540000000"
          }
        }
      ]
    }
  }
}

Just a tip: As you can see, you need to leverage scripting and that can penalize the performance. Since you know which dates you want to compare beforehand, you should store the date differences in additional scalar fields that you can easily compare with range queries afterwards.

这篇关于弹性的文档日期比较问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆