减去带有不同时间戳的两个文档之间的数字字段 [英] Subtract numeric fields between two documents with different timestamp

查看:72
本文介绍了减去带有不同时间戳的两个文档之间的数字字段的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

可以说我有这些数据样本:

Lets say I have these data samples:

{
    "date": "2019-06-16",
    "rank": 150
    "name": "doc 1"
}

{
    "date": "2019-07-16",
    "rank": 100
    "name": "doc 1"
}

{
    "date": "2019-06-16",
    "rank": 50
    "name": "doc 2"
}

{
    "date": "2019-07-16",
    "rank": 80
    "name": "doc 2"
}

预期结果是通过从日期不同(旧日期-新日期)的两个相同名称的文档中减去等级字段来实现的:

The expected result is by subtracting the rank field from two same name of docs with different date (old date - new date):

{
    "name": "doc 1",
    "diff_rank": 50
}

{
    "name": "doc 2",
    "diff_rank": -30
}

并尽可能按 diff_rank 进行排序,否则我将在得到结果后手动进行排序.

And sort by diff_rank if possible, otherwise I will just sort manually after getting the result.

我尝试过使用 date_histogram serial_diff ,但是某些结果缺少了 diff_rank 值,因此我确定该数据存在:

What I have tried is by using date_histogram and serial_diff but some results are missing the diff_rank value in somehow which I am sure the data exist:

{
   "aggs" : {
        "group_by_name": {
            "terms": {
                "field": "name"
            },
            "aggs": {
                "days": {
                    "date_histogram": {
                        "field": "date",
                        "interval": "day"
                     },
                    "aggs": {
                        "the_rank": {
                            "sum": {
                                "field": "rank"
                            }
                        },
                        "diff_rank": {
                           "serial_diff": {
                              "buckets_path": "the_rank",
                              "lag" : 30 // 1 month or 30 days in this case
                           }
                        }
                    }
                }
            }
        }
    }
}

我们将非常感谢您提供的帮助来解决我的上述问题!

The help will be much appreciated to solve my issue above!

推荐答案

最后,我使用

Finally, I found a method from official doc using Filter, Bucket Script aggregation and Bucket Sort to sort the result. Here is the final snippet code:

{
    "size": 0,
    "aggs" : {
        "group_by_name": {
            "terms": {
                "field": "name",
                "size": 50,
                "shard_size": 10000
            },
            "aggs": {
                "last_month_rank": {
                    "filter": {
                        "term": {"date": "2019-06-17"}
                     },
                    "aggs": {
                        "rank": {
                            "sum": {
                                "field": "rank"
                            }
                        }
                    }
                },
                "latest_rank": {
                    "filter": {
                        "term": {"date": "2019-07-17"}
                     },
                    "aggs": {
                        "rank": {
                            "sum": {
                                "field": "rank"
                            }
                        }
                    }
                },
                "diff_rank": {
                    "bucket_script": {
                        "buckets_path": {
                          "lastMonthRank": "last_month_rank>rank",
                          "latestRank": "latest_rank>rank"
                        },
                        "script": "params.lastMonthRank - params.latestRank"
                    }
                },
                "rank_bucket_sort": {
                    "bucket_sort": {
                        "sort": [
                            {"diff_rank": {"order": "desc"}}
                        ],
                        "size": 50
                    }
                }
            }
        }
    }
}

这篇关于减去带有不同时间戳的两个文档之间的数字字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆