ElasticSearch/Kibana:获取在比特定日期更新的条目中找不到的值 [英] ElasticSearch/Kibana: get values that are not found in entries more recent than a certain date

查看:98
本文介绍了ElasticSearch/Kibana:获取在比特定日期更新的条目中找不到的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我拥有大量设备,这些设备会定期(例如每10分钟发送一次)按以下格式进入ElasticSearch:

I have a fleet of devices that push to ElasticSearch at regular intervals (let's say every 10 minutes) entries of this form:

{
    "deviceId": "unique-device-id",
    "timestamp": 1586390031,
    "payload" : { various data }
}

我通常通过Kibana来查看数据,方法是过滤最后7天的数据,然后按设备ID或有效负载中的其他数据进行细分.

I usually look at this through Kibana by filtering for the last 7 days of data and then drilling down by device id or some other piece of data from the payload.

现在,我想通过查找在过去一小时内未报告任何内容的设备来了解这支车队的健康状况.我一直在搞各种各样的过滤器和可视化,而我得到的最接近的是一个数据表,其中包含设备ID和每个条目的最后一个条目的时间戳,并按时间戳排序.这很有用,但由于我有数千台设备,因此很难使用.

Now I'm trying to get a sense of the health of this fleet by finding devices that haven't reported anything in the last hour let's say. I've been messing around with all sorts of filters and visualisations and the closest I got to this is a data table with device ids and the timestamp of the last entry for each, sorted by timestamp. This is useful but is somewhat hard to work with as I have a few thousand devices.

我梦dream以求的是让上述表格仅包含最近一小时未报告的设备ID,或者仅获取两个数字:过去7天所看到的不同设备ID的总数以及过去一小时内未显示的设备ID总数.

What I dream of is getting either the above mentioned table to contain only the device ids that have not reported in the last hour, or getting only two numbers: the total count of distinct device ids seen in the last 7 days and the total count of device ids not seen in the last hour.

如果其中任何一种可能,您能指出正确的方向吗?

Can you point me in the right direction, if any one of these is even possible?

推荐答案

我将跳过表格,采用第二种方法-仅获取计数.我认为有可能从计数倒退到行中.

I'll skip the table and take the second approach -- only getting the counts. I think it's possible to walk your way backwards to the rows from the counts.

注意:我将使用人类可读的时间格式而不是时间戳,但是 epoch_seconds 在您的实际用例中也可以正常工作.此外,我添加了 comment 字段,为每个文档提供了一些背景信息.

Note: I'll be using a human readable time format instead of timestamps but epoch_seconds will work just as fine in your real use case. Also, I've added the comment field to give each doc some background.

首先,设置您的索引:

PUT fleet
{
  "mappings": {
    "properties": {
      "timestamp": {
        "type": "date",
        "format": "epoch_second||yyyy-MM-dd HH:mm:ss"
      },
      "comment": {
        "type": "text"
      },
      "deviceId": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

同步一些文档-我在UTC + 2中,所以我选择了这些时间戳:

Sync a few docs -- I'm in UTC+2 so I chose these timestamps:

POST fleet/_doc
{
  "deviceId": "asdjhfa343",
  "timestamp": "2020-04-05 10:00:00",
  "comment": "in the last week"
}

POST fleet/_doc
{
  "deviceId": "asdjhfa343",
  "timestamp": "2020-04-10 13:05:00",
  "comment": "#asdjhfa343 in the last hour"
}

POST fleet/_doc
{
  "deviceId": "asdjhfa343",
  "timestamp": "2020-04-10 12:05:00",
  "comment": "#asdjhfa343 in the 2 hours"
}

POST fleet/_doc
{
  "deviceId": "asdjhfa343sdas",
  "timestamp": "2020-04-07 09:00:00",
  "comment": "in the last week"
}

POST fleet/_doc
{
  "deviceId": "asdjhfa343sdas",
  "timestamp": "2020-04-10 12:35:00",
  "comment": "in last 2hrs"
}

总共有以下条件的5个文档和2个不同的设备ID

In total, we've got 5 docs and 2 distinct device ids w/ the following conditions

  1. 过去7天都出现了
  2. 两者都在最近2小时内
  3. 最后一个小时里只有一个

所以我有兴趣精确地找到1个 deviceId ,它出现在最近2小时但不是最近1小时.

so I'm interested in finding precisely 1 deviceId which has appeared in the last 2hrs BUT not last 1hr.

使用过滤器(用于范围过滤器),基数(用于不同计数)和

Using a combination of filter (for range filters), cardinality (for distinct counts) and bucket script (for count differences) aggregations.

GET fleet/_search
{
  "size": 0,
  "aggs": {
    "distinct_devices_last7d": {
      "filter": {
        "range": {
          "timestamp": {
            "gte": "now-7d"
          }
        }
      },
      "aggs": {
        "uniq_device_count": {
          "cardinality": {
            "field": "deviceId.keyword"
          }
        }
      }
    },
    "not_seen_last1h": {
      "filter": {
        "range": {
          "timestamp": {
            "gte": "now-2h"
          }
        }
      },
      "aggs": {
        "device_ids_per_hour": {
          "date_histogram": {
            "field": "timestamp",
            "calendar_interval": "day",
            "format": "'disregard' -- yyyy-MM-dd"
          },
          "aggs": {
            "total_uniq_count": {
              "cardinality": {
                "field": "deviceId.keyword"
              }
            },
            "in_last_hour": {
              "filter": {
                "range": {
                  "timestamp": {
                    "gte": "now-1h"
                  }
                }
              },
              "aggs": {
                "uniq_count": {
                  "cardinality": {
                    "field": "deviceId.keyword"
                  }
                }
              }
            },
            "uniq_difference": {
              "bucket_script": {
                "buckets_path": {
                  "in_last_1h": "in_last_hour>uniq_count",
                  "in_last2h": "total_uniq_count"
                },
                "script": "params.in_last2h - params.in_last_1h"
              }
            }
          }
        }
      }
    }
  }
}

date_histogram 聚合只是一个占位符,使我们能够使用 bucket脚本来获得最终的差异,而不必进行任何后处理.

The date_histogram aggregation is just a placeholder that enables us to use a bucket script to get the final difference and not have to do any post-processing.

由于我们传递了 size:0 ,因此我们对 hits 部分不感兴趣.因此,仅采用聚合,以下是带注释的结果:

Since we passed size: 0, we're not interested in the hits section. So taking only the aggregations, here are the annotated results:

...
"aggregations" : {
    "not_seen_last1h" : {
      "doc_count" : 3,
      "device_ids_per_hour" : {
        "buckets" : [
          {
            "key_as_string" : "disregard -- 2020-04-10",
            "key" : 1586476800000,
            "doc_count" : 3,            <-- 3 device messages in the last 2hrs
            "total_uniq_count" : {
              "value" : 2               <-- 2 distinct IDs
            },
            "in_last_hour" : {
              "doc_count" : 1,
              "uniq_count" : {
                "value" : 1             <-- 1 distict ID in the last hour
              }
            },
            "uniq_difference" : {
              "value" : 1.0             <-- 1 == final result !
            }
          }
        ]
      }
    },
    "distinct_devices_last7d" : {
      "meta" : { },
      "doc_count" : 5,                  <-- 5 device messages in the last 7d
      "uniq_device_count" : {
        "value" : 2                     <-- 2 unique IDs
      }
    }
  }

这篇关于ElasticSearch/Kibana:获取在比特定日期更新的条目中找不到的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆