PromQL查询以查找每个触发警报的持续时间 [英] PromQL query to find the duration of each firing alert

查看:122
本文介绍了PromQL查询以查找每个触发警报的持续时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在创建一个Grafana仪表板,以查看每个触发警报的总警报计数和持续时间(这意味着它应捕获警报已触发状态的时间).

I am creating a Grafana dashboard to see the total alert count for each firing alert and the duration ( means it should capture how long alerts have been firing state).

用于捕获警报总数的PromQL查询如下,

PromQL query used to capture the total alert count is as follows,

通过(警报名称,客户名称)(更改(customer_ALERTS [24h]))计数

想法是在Grafana表格面板中再添加两个列,其中包含 alert count duration

Idea is to add two more column in the Grafana table panel having the alert count and the duration

现在,我需要获取查询以捕获每个警报的持续时间.有人可以分享一些想法吗?

Now i need to get the query to capture the duration for each alerts. Can somebody please share some thoughts?

推荐答案

如果知道警报的评估间隔,则可以使用以下PromQL查询来计算过去24小时内处于触发状态的警报的持续时间(以秒为单位):

If you know the evaluation interval for alerts, then the following PromQL query could be used for calculating the duration in seconds for alerts in firing state over the last 24 hours:

count_over_time(customer_ALERTS[24h]) * <evaluation_interval_in_seconds>

查询假定 customer_ALERTS 在警报触发时包含非空值,而在警报未触发时不包含任何值.如果在警报未触发时 customer_ALERTS 包含值,而在警报触发时 one 值,则应使用以下查询而不是确定触发状态下的警报持续时间(以秒为单位):

The query assumes that customer_ALERTS contains non-empty values when alert is firing and has no any values when the alert isn't firing. If the customer_ALERTS contains zero values when the alert isn't firing and one values when the alert is firing, then the following query should be used instead for determining the duration of alerts in firing state in seconds:

avg_over_time(customer_ALERTS[24h]) * 24 * 3600

如果 customer_ALERTS 包含其他用于触发/不触发状态的值,则 MetricsQL函数,例如 lifetime(m [d]) share_gt_over_time(m [d],gt) count_gt_over_time(m [d],gt).

If customer_ALERTS contains other values for firing / not firing state, then PromQL subqueries could be used for counting samples in firing state. Take a look also at MetricsQL functions such as lifetime(m[d]), share_gt_over_time(m[d], gt) or count_gt_over_time(m[d], gt).

这篇关于PromQL查询以查找每个触发警报的持续时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆